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1. Introduction. In certain applications, as in bioassay, sensitivity testing, or 
fatigue trials, the statistician is often interested in estimating a given quantile of 
a distribution function on the basis of data which is of the zero-one type. For 
example, suppose F(a) denotes the probability that a metallic test specimen will 
fracture if subjected to x cycles in a fatigue trial. Then a specimen, when tested 
in such a way, represents an observation which takes on the value one or zero 
depending on whether or not it fractures. It is of interest to estimate that number 
of cycles x such that, for a given a, F(x) = a. The techniques of possible use in 
this connection, such as probit analysis [8] and the “up and down” method of 
Dixon and Mood [6], depend to a great extent on parametric assumptions con- 
cerning the distribution function F(z). Robbins and Monro [13] considered a 
problem of which the above problem, with or without parametric assumptions, 
is a special case. Suppose for every real value x, the random variable Y(z), 
denoting the value of a response to an experiment carried out at a controlled 
level xz, has the unknown distribution function H(y |x) and regression func- 
tion M(x) = f*, y dH(y| x). Let @ be any given real number. Robbins and 
Monro considered the problem of estimating the root of the equation M(x) = a, 
assuming the existence of a unique root. If Y(z) = 1 or 0 with probabilities 
F(x) and 1 — F(x) respectively, where F(z) is a distribution function and 
0 < a <= 1, then M(x) = F(x), and we have the above special case. 

The problem of estimating a root of a given regression function has its counter- 
part in the literature of the more classical mathematics. Newton’s method of 
approximation is, perhaps, the best-known iterative procedure used for such a 
problem when no random element is present. However, even if Y(z) = M(x) 
with probability one—i.e., if no randomness exists—Newton’s method is not 
applicable; for Newton’s method and other classical procedures depend on 
knowing the functional form of M(x), whereas, here, such knowledge is not 
assumed. 

Because of the nonparametric nature of the problem, a method of approach 
not based on the usual curve-fitting techniques, is clearly necessary. As a 
solution, Robbins and Monro put forward the following iterative scheme. Let 
‘a,| (n = 1) bea fixed sequence of positive constants such that 


(1.1) > a, = * 
nel 
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The sequence a, = 1/n, for example, satisfies (1.1). Let x, , the level of the first 
experiment, be arbitrary. Succeeding levels are defined recursively by 


(1.2) Tn4+1 = In + a,(a a *.). 


where y, denotes the response at level z,—a random variable dependent only on 
x, and having the distribution function H(y | z,). Thus at each stage of experi- 
mentation, a new level is chosen, based upon the deviation of the previous 
response from a and on the number of experiments already performed. 

Since the proposal of their scheme, considerable attention has been focused in 
this direction. Some of this attention has been directed towards establishing 
conditions under which the Robbins-Monro procedure is reasonable; some has 
been directed towards treating similar problems with different but analogous 
schemes; and some has been directed towards providing a more general theory of 


stochastic approximation. This paper is an exposition of work done along these 
lines. 


2. The Robbins-Monro Process. Let 6 be the root of the equation M(x) = a. 
Robbins and Monro [13] proved that x, defined by (1.2) converges in the mean 
to 0, i.e., lim,.. E(X, —6)* = 0, in two separate cases. In one case the function 
M(z) is discontinuous at 6 with | M(x) — a| being bounded away from zero for 
all c ~ @. (In fact, 17(@) need not equal a.) In the other case, M(x) is nonde- 
creasing, M (@) = a, and M’(@) > 0. In both cases the rather strong condition 
that Y(x) be bounded with probability one for all xz was imposed. However, it 
should be remarked that for the purpose of estimating a quantile with zero-one 
data, the condition is not restrictive. 

Wolfowitz [16] was the next to take up the problem. He showed that z, con- 
verges to 6 in probability under weaker conditions. His most significant improve- 
ment was to replace the boundedness condition of Robbins and Monro with the 
condition that M(x) and o2 = SS (y — M (2))’ dH(y | x) be bounded functions 
of x. 

The following conditions, which are weaker than both the Robbins-Monro and 
Wolfowitz conditions, were assumed by Blum [1]. 


(2.1) |M(xz)|Se+d|\qc| for some c,d 2 0, 


(2.2) oso < @ for all z, 


(2.3) M(x) < a (x < 8), M(x) > a(x > 8), 
(2.4) inf | M(x) — a| for every 5, & > 0. 


$1S|z—01 S52 
Under these assumptions Blum was able to show that P(lim »..2, = 6) = 1. 
That this is true, with no stronger assumption than (2.4), is somewhat surprising. 
For (2.4) allows the possibility that M(x) — a as |x| — ~, and in such a case 
one would expect that there might be positive probability of | z, | converging 
to ~, 
While the proofs of Robbins and Monro and of Wolfowitz used arguments 





STOCHASTIC APPROXIMATION 881 


rather special to the process under consideration, Blum’s method related to other 
known results. More specifically, it can be shown using Martingale theory or, 
more directly, Kolmogorov’s inequality that because of (1.1) and (2.2), 
> F1 ai(y; — M(z;)) converges with probability one. (See Loéve [12], p. 387.) 
Consequently 


n n 


Sn41 — 2 aa — M(x) = 1 — Day; — M(x) 


j= j=l 


converges with probability one. Imposing the conditions of (2.1) and (2.3), Blum 
was able to show that x, converges with probability one to a random variable 
W which is finite with probability one. Then (2.4) was enough of a further 
assumption to allow him to prove that W = 6 with probability one. 

Recently, Dvoretzky [7] has shown that under Blum’s conditions, z, also con- 
verges in the mean to 6. Dvoretzky’s work will be discussed below. 

While the results of Blum and Dvoretzky show that under wide conditions the 
Robbins-Monro process converges to 6 both in mean square and with probability 
one, it is of interest, particularly for statistical purposes, to obtain sharper con- 
vergence theorems. To this end, Chung [4] considered two cases. In his first 
(bounded) case he assumed that 


(2.5) M(z)=a+ta(z—6)+o0(\z—-0|) (O<am<~), 
(2.6) inf | M(x) — a| = K,(5) > 0 for every 6 > 0, 


|x—O|>z 
P(|Y(z) -— a|sS Ki < @&) = 1 for all z, 


lim o2 = «3 > 0. 

z+ 
Under the conditions of (2.5), (2.6), (2.7), and (2.8), Chung showed that if 
a, = 1/(n**), where 1/[2(1 + K,)] < ¢ < }(K, being a constant arising in his 
analysis), then n“~**(x,, — @) tends in distribution to the normal distribution 
with mean 0 and variance o3/(2a;). In his second (quasi-linear) case, he replaced 
(2.7) with 
(2.9) K\|z—-—6|s|M(z)-—a|S K'|\|x-—0| K>0O,K'’>»@, 


and 


(2.10) [\y-M@ Pany|2)sK@)< © p=1,2- 


? 


and showed that if a, = c/n, c > 1/(2K), then n' (x, — 6) tends in distribution 
to the normal distribution with mean 0 and variance (9c’)/(2a:c’ — 1). Both 
results were proved by showing the proper convergence of moments. In earlier 
papers, Kallianpur [10] and Schmetterer [14] and [15] obtained certain bounds 
for E(x, — 6)°. For the most part, however, their results are contained in those of 
Chung. 

The question arises as to whether other limiting distributions might exist. 
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Chung also showed that all stable laws are possible limiting distributions; and 
furthermore, no limiting distribution need necessarily exist. 

For purposes of application, Chung’s results still left something to be desired. 
Kiefer, who contributed largely to the last section of [4], remarked that, for the 
quasi-linear case, if a, = c/n, with e = 1/a,, the Robbins-Monro estimate of 6 
is, under certain regularity conditions, asymptotically minimax if the loss func- 
tion of an estimate d is |@ — d|', r = 0. That this is true follows with slight 
modification from results obtained by Wolfowitz [17] on minimax estimation 
of the mean of a normal distribution with known variance. However, the con- 
ditions of the quasi-linear case are not satisfied if J/(2) is a distribution function— 


as is the case in the quantal response problem. Here the bounded case is ap- 


plicable, but the estimate based on a, = 1/n'‘ has asymptotic efficiency zero. 
Hodges and Lehman [9], using an idea attributed to Stein, bridged the gap 
between the quasilinear case and the bounded case, proving that, in the bounded 
case, n’ (x, — 6) also converges in distribution to the normal distribution with 
mean 0 and variance (o9c°)/(2aic’ — 1) if a, = c/n,e > 1/(2K”), where K” = 
inf), o<4| M(x) — a x — 6, (A being any positive number such that 
kK” > 0). It is not known whether the moments of n' (x, — @) converge to the 
moments of the limiting distribution. Their method was to show, using Blum’s 
probability one convergence theorem, that the asymptotic distribution of z, 
depends only on those values of (x) defined in the neighborhood of x = 8. 
Within any finite interval, a function M(x) satisfying the conditions of the 
bounded case wil! also satisfy the conditions of the quasi-linear case, so that as 
far as the asymptotic distributions are concerned, the two cases are the same. 

Coming back to the quasi-linear case, it has been remarked that for a, = 
1/(ayn) and loss function | @ — d |’, the Robbins-Monro procedure is asympto- 
tically minimax over all possible procedures. Dvoretzky [7] has shown that if it 
is known that |x, — 6! < C S [(20°)/K(K’ — K)}}, where o° is defined by 
(2.2), then the choice of a, = (KC")/(o° + nK°C") yields estimates such that 


9_ 


oC 


(2.11) Ra. - F + ——-—— ; 
eae ” =o + (n — 1)K€? 


and if any other coefficients are used, there exists an 2, and a function MW(z) 
satisfying the quasi-linear conditions such that (2.11) does not hold. Except for 
the case K = a, Dvoretzky’s coefficients lead to estimates having asymptotic 
variance larger than that obtained by letting a, = 1/(an). This loss in asymp- 
totic efficiency is, of course, the price paid for small-sample optimality. 
Lehmann and Hodges raised the questions as to how much agreement exists 
between asymptotic and small-sample theory and how c, if one uses the co- 
efficients a, = c/n, is to be chosen if a; = M’(@) is unknown. Since the behavior 
of the variance of the estimate is unknown for c < 1/(2K), they remarked that 
one would be tempted, if an a priori guess is to be made, to overestimate c. 
They would also overestimate c on the grounds that ‘q9c°)/(2aye — 1), the 
asymptotic variance of n*(x, — 6), inereases more slowly for increasing ¢ > 
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|/a, than for decreasing c < 1/a;. In order to gain more insight concerning the 
proper choice of n and c, they considered the special case of a linear M(x) and 
constant ¢2 . Here it is possible to compute exact variances for all n and to study 
the effect of varying ¢ on the exact variance. They found, for this special case, 
rather close agreement between asymptotic and small-sample theory for n = 20 
and c > 1/a; . However, force < 1/a, it appears that the rate of approach of the 
small-sample variance to the asymptotic variance is much slower, and thus the 
danger due to underestimating c is, perhaps, not as great as the asymptotic 
theory suggests. 


3. The Kiefer-Wolfowitz Process. Kiefer and Wolfowitz [11] considered the 
problem of estimating the value of x = @ such that M(z) is maximum, assuming 
the existence of a unique maximum. They suggested the following scheme. Let 
{a,\ and {c,} be sequences of positive numbers such that 


~» «2 
(3.1) o—-0, Da=-0, Pac <@, > an < ow. 
nal nel 


For example, a, = 1/n,c, = 1/né are such sequences. Let x, be arbitrary. Then 
define recursively 


€ - == - —1 
(3.2) tn+l — In + AnCn (Yen +? Yon- 1)5 


where yo,—1 and ye, are independent random variables with respective distribu- 
tions H(y|2, — Cn) and H(y | a, + ¢,). They proved under certain regularity 
conditions that zx, converges in probability to 6. Using the same method as with 
the Robbins-Monro process, Blum [1], under weaker conditions, showed that the 
process converges with probability one. It also turns out that the condition 
> anc, < © is unnecessary. The conditions of Blum and of Kiefer and Wolfowitz 
are of such a nature that functions like M(x) = a, —zx’ are ruled out. Derman 
[5] considered functions which might be called “‘quasi-parabolic,’’ analogous to 
Chung’s quasi-linear functions—i.e., functions whose difference quotients lie 
between two straight lines with positive slopes. Functions like —2* satisfy these 
conditions. It was shown in such cases that x, converges in the mean to @ and in 
more restrictive cases, where M(x) is locally parabolic at 6, there is, with proper 
normalization, convergence to the normal distribution. Burkholder [3] also ob- 
tained results pertaining to asymptotic normality. 

The weakest set of conditions for convergence of the Kiefer-Wolfowitz process 
which allow both e* and —2’ were given by both Burkholder and Dvoretzky. 
Burkholder proved probability one convergence and Dvoretzky proved both 
convergence with probability one and in the mean square. These conditions in 
Dvoretzky’s form are as follows: 


(3.3) |M(rx + 1) — M(x)| < Ala|+ B < « for all z and some A, B, 
(3.4) sup DM(x) < 0, inf DM(x) > 0 fork = 1,2,--:- 


l/k<z—-0<k 1/k<0—2z<k 


(3.5) oo<a<e, 
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where DM (x) and D(x) denote the upper and lower derivates of M(x) at x and 
o- is as in (2.2). 

An undesirable feature of the Kiefer-Wolfowitz process is that two observa- 
tions must be taken at each stage of experimentation. This of course raises the 
question of whether there exists a procedure having desirable convergence 
properties which requires only one observation at each stage.’ No such procedure 
has yet been suggested. However, the approach taken by Dvoretzky appears as 
if it might allow results in this direction. In the cases that Derman considered, it 
also turns out that the Kiefer-Wolfowitz procedure yields estimates which have 
zero asymptotic efficiency—i.e., if x, is any estimate based on one set of coef- 
ficients {a,} and {c,}, there exists another estimate xz, based on coefficients 
fa’,} and {c,,} such that lim,.. E(x, — 0)°/E(a, — 6)” = 0. Thus a better pro- 
cedure seems desirable from this point of view. It is of some interest to note 
that for cases where M(x) is symmetric about 6, the Robbins-Monro proce- 
dure may be used. More explicitly, let « be a small positive number and let 
M(x) = M(z + ©) — M(x — © and yy = Yon — Yon, Where Yon and Yon—s are 
observations at (x + e) and (x — e) respectively. Then, since M(z) is a monotone 
function of z, and @ is the value of z such that M(x) = a = 0, the Robbins- 
Monro procedure 2241 = 2a — 4ni» is applicable. Burkholder has pursued this 
idea further into cases where M (zx) is not necessarily symmetric. In such cases, if 
2, converges, it converges to a constant which will, in general, differ from @. 


4. Other Procedures. Blum [2] has considered multidimensional analogues to 
the above problems. Let Y(x) be a k-dimensional random vector with joint dis- 
tribution function H(y |x), where zx is also a k-dimensional vector, and let 


M(x) denote the expectation of Y(x), where by this we mean that the 7th com- 
ponent of M(z) is the expectation of the ith component of Y (x). Conditions were 
found to ensure that, for a given vector a, a multidimensional version of the 
Robbins-Monro procedure converges with probability 1 to a vector z = 4@, 
where ./(6) = a. Suppose Y(x) is a random variable which is dependent on z, a 
k-dimensional vector, and has expectation M(z), a function of x assumed to have 
a unique maximum. Conditions were also found such that a generalization of the 
Kiefer-Wolfowitz procedure (k + 1 observations at each stage) would yield 
estimates converging to the vector z = 6, where (6) is maximum. Martingale 
theory was employed in the convergence proofs. 

Returning to one dimension, Burkholder [3] investigated a process slightly 
more general than either the Robbins-Monro or the Kiefer-Wolfowitz process. 
Burkholder’s process is of the form 


(4.1) In4t = In + Anzn, 


where {a,} is a sequence of positive numbers and z, is a random variable with 
distribution function H,,(z | x,)—i.e. the distribution functions and therefore the 
regression functions depend on n. For example, M,(z) = (1/c,)(M(x + en) — 
M(x — cn)) in the Kiefer-Wolfowitz procedure is a function of n. Using methods 


3 As a matter of fact, this was the original problem concerning the maximum of a regres 
sion function posed by H. Robbins. 
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of Blum, Chung, and Lehmann and Hodges, Burkholder was able to prove 
various convergence theorems concerning his process. His results carry over to 
situations where x, converges to a nonconstant random variable—this occurring 
when there is no uniqueness of, say, the root of M(x) = a or the maximum of 
M(x). As a special case of his process, he exhibited a procedure which converges 
to the point of inflection of a function; e.g., estimating the maximum of a density 
function with zero-one data is a particular application of such a procedure. 
Another application permitted by his more general procedure is that of esti- 
mating, by taking additional observations at each stage, unknown constants of 
interest such as a and a; , arising in Section 2. 


5. A more general approach to stochastic approximation. A more general 
approach taken by Dvoretzky [7], viewing a stochastic approximation procedure 
a8 a convergent deterministic procedure with a superimposed random element, 
has proved to be enlightening. For example, suppose 7',(71,--- , 2.) is any 


transformation of an n-dimensional Euclidean space &, into the real numbers 
such that for some x = 8, 


(6.2) | Tile, ++, La) 6|s Fal\la. — 6| forall (m,---,2,) €&, 


where F,,, is a sequence of pos‘tive numbers satisfying 


II F. = 0. 


n=l 


Tani = T(t, °°* , Zn) + Ya, 


where Y, (n = 1, --- ) are random variables such that F (Y, | 2, ---*,2,) = 0 
and S10, = °_, EY, < &. Then, putting Vi, = E (x, — 6)° and using 
(5.1), we have 


(5.4) Vian 3 FAV, + on. 
Let ba» = []f.4: Fi . On iterating (5.4) we get 


n—l 


(5.5) Vist SQ aides + on + Vidor. 

For every fixed v, b,.. —~ 0 as n — ~ by (5.2); and since Danton < ®, it 
follows, assuming Vi < ~, that the right side of (5.5), and consequently the 
left, tends to 0 as n — «. Thus any stochastic approximation procedure given 
by (5.3), with 7’, satisfying (5.1) and (5.2) and 2 chosen such that Vi < @, 
yields an estimate which converges in the mean to 6. For the Robbins-Monro 
scheme, 7, = 2, + a,(a — M(zx,)) and Y, = a,(M(2,) — y,). Under certain 
restrictive conditions, (5.1) and (5.2) hold. However, in order that this approach 
be more generally applicable, it is necessary to weaken condition (5.1). For 
example, such a weakening is that for sequences of non-negative real numbers 
Gn, Bn, and Yn, satisfying limnase an = 0, DS1Bn < ©, Doni%n = ®, 


(5.6) | Tr(z, +++, an) — 6| S max (an, (1 + Ba) | tn — O| — Yn)- 
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A further weakening permits a, , 8, , Y, to depend on x, --- , x, . Under such 
conditions Dvoretzky was able to prove that the process (5.3) converges to @ in 
the mean and with probability one. These conditions are weak enough to apply to 
the Robbins-Monro and Kiefer-Wolfowitz processes, yielding results mentioned 
above. 

One might expect, then, that whenever a convergent deterministic iteration 
procedure converges, its stochastic counterpart given by (5.3) will also converge. 
Dvoretzky constructed a counterexample to show that this is not the case. Thus 
the conditions F(Y,|2,;,---,2,) = 0 and b e- , EY*, < = are not strong 
enough to allow conditions like (5.6) to be removed. 

A further advantage of this general approach is that the convergence theorems 
hold, with appropriate changes, if x is an element of a normed linear space. Such 
generality is useful since, in many applications, x will not be a one-dimensional 
variable. For example, the multidimensional cases treated by Blum can be con- 
sidered from this point of view. 
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CONSISTENCY OF THE MAXIMUM LIKELIHOOD ESTIMATOR IN THE 
PRESENCE OF INFINITELY MANY INCIDENTAL PARAMETERS 


By J. Krerer! anp J. WoLrowiItTz? 
Cornell University 


Summary. It is shown that, under usual regularity conditions, the maximum 
likelihood estimator of a structural parameter is strongly consistent, when 
the (infinitely many) incidental parameters are independently distributed 
chance variables with a common unknown distribution function. The latter is 
also consistently estimated although it is not assumed to belong to a parametric 
class. Application is made to several problems, in particular to the problem of 
estimating a straight line with both variables subject to error, which thus after 
all has a maximum likelihood solution. 


1. Introduction. Let {X;;}, 7 = 1,---,, 7 = 1, --: , k, be chance variables 
such that the frequency function of Xa,--- , X« is f(x| 6, a;) when @ and a, 
are given, and thus depends upon the unknown (to the statistician) parameters 
6 and a;. The parameter 6, upon which all the distributions depend, is called 
“structural”; the parameters {a;} are called “incidental”. Throughout this paper 
we shall assume that the X,; are independently distributed when 6, a,---, 
a, , are given, and shall consider the problem of consistently estimating 6 (as 
n — «). The chance variables {X,;} and the parameters @ and {a;} may be vec- 
tors. However, for simplicity of exposition we shall throughout this paper, 
except in Example 2, assume that they are scalars. Obvious changes will suf- 
fice to treat the vector case. 

Very many interesting problems are subsumed under the above formulation. 
Among these is the following: 

(1.1) f(x | 0, «:) = (206)*” exp Lf = wT 


fe 


Suppose now that the {a;} are considered as unknown constants and we form 
in the usual manner the likelihood function 


‘ 
(1.2) (2%8)*"” exp \- z D (Xi — ai)" 
2 


4 


corresponding to (1.1). Then the maximum likelihood (m.l.) estimator of @ is 
D (Xi — Xs 
62 

kn 


(1.3) 
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with X; = >; X;;, and is obviously not consistent. This example is due to 
Neyman and Scott [1], who used it to prove that the m.]. estimator’ need not be 
consistent when there are infinitely many incidental parameters (constants). 
The latter authors, to whom the names “structural” and “incidental” are due, 
seem to have been the first to formulate the general problem. Special forms of the 
problem, like Example 2 below, had been studied for a long time (e.g., Wald 
[2] and the literature cited there). 

The general fact that, when the {a,;} are unknown constants, the m.]. estimator 
of 6 need not be consistent, is certainly basically connected with the fact that, 
since there are only a constant number of observations which involve a particular 
a;, it is in general impossible to estimate the {a;} consistently. Now there are 
many meaningful and practical statistical problems where the {a;} are not 
arbitrary constants but independently and identically distributed chance 
variables with distribution function (df) Go (unknown to the statistician). The 
question then arises whether the m.]. method, which does not always yield a con- 
sistent estimator when there are infinitely many incidental constants, and does 
yield consistent estimators in the classical parametric case where there are no 
incidental parameters, will give a consistent estimator in this case, where the 
{a;} are independent chance variables with the common df G) . This note is de- 
voted to this question. 

The answer is affirmative. Not only is the m.]. estimator of @ strongly con- 
sistent (i.e., converges to @ with probability one) under reasonable regularity 
conditions, but also the m.]. estimator of Gp converges to Gy at every point of con- 
tinuity of the latter, with probability one (w.p.1). This is the more striking when 
one recalls that Gy does not belong to a parametric class, i.e., a set of df’s indexed 
by a finite number of parameters. (If Go were a member of such a given class, the 
problem would fall completely in the domain of classical maximum likelihood.) 
The interest of the present authors was originally in estimating 6. That G can also 
be estimated by the m.]. method is a felicitous by-product of our investigation. 
A heuristic explanation of the present result may be this: A sequence of chance 
variables is more “regular” than an arbitrary sequence of numbers. In the present 
procedure one does not attempt to determine the particular values of the chance 
variables {a;}, but only their distribution function; thus, we seek the m.]. 
estimator of the “parameter” y = (8, G) based on a sequence of independent 
random variables whose common distribution function is indexed by y. 

In sections 3, 4, and 5, the results are applied to three problems which seem to 
be of interest per se. Among these is the problem of fitting a straight line with 
both variables subject to normal error. This problem has a very long history and 
has been the subject of many investigations (see, for example [2], [7], [4], and the 
literature cited there) ; it seems interesting that it can, after all, be treated by the 
m.l. method. The verification of the regularity assumptions or the formulation of 
not too onerous conditions for them to be verified is sometimes not entirely ob- 


3 Throughout this paper, for the sake of brevity, we use the term “‘estimator’’ to mean 
“sequence of estimators for n = 1, 2, --- .” 
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vious, and the verification of these assumptions (in the form used in Section 2) 
constitutes the main difficulty of the paper. As is explained in detail below, the 
fact that these assumptions imply the general consistency result of Section 2 
follows from a modification of the proof of [5]. Professor Herbert Robbins has 
kindly called our attention to his abstract in Ann. Math. Stat. , vol. 21 (1950), p. 
314, Abstract 35, which states that the m.|. estimator of G is consistent. Since 
nothing further has appeared on this subject, the intended restrictions under 
which the statement is true, and the intended method of proof, are unknown to 
the present authors. This seems to be the second instance in the literature where 
the m.|. estimator has been used to estimate an entire df which is not assumed to 
belong to a class depending only on a finite number of real parameters. The first 
instance of the employment of such an estimator is the classical estimation of a 
df by its empiric df (shown to be asymptotically optimal in [3]), which is its m.1. 
estimator (see the paragraph preceding the lemma in Section 2). The only other 
instance of the estimation of a df in the nonparametric case seems to be that of the 
estimation of identifiable df’s in stochastic structures such as those of the present 
paper by means of the minimum distance method [4].4 (The latter requires 
regularity conditions weaker than those of the present paper. Compare, for ex- 
ample, [4] with Example 2 below; see also Example 3a.) 

In connection with these examples, and also in Section 6, we give some ex- 
amples which illustrate the fact that the classical m.]. estimator may not be con- 
sistent, even in parametric examples which lack the pathological discontinuity 
sometimes present in hitherto published examples. 

Section 6 also contains the statement of a simple device which can be used in 
the classical parametric case as well as in the case studied in this paper, to prove 
consistency of the m.l. estimator in some cases where the assumptions used in 
published proofs of consistency are not satisfied. 

The proof in Section 2 is a modification of Wald’s [5], and its fundamental ideas 
are to be found in [5]; for this reason some of its details will be omitted. Wald 
states in his paper that his method applies more generally when his Assumption 
9 is fulfilled. However, this assumption is not fulfilled in our problem ab initio 
and some technical modifications have to be made. One obstacle to extending 
Wald’s proof to our problem is in establishing an analogue of (16) in [5]; one 
“neighborhood of infinity’? does not always seem to suffice. Also some changes in 
the assumptions are made necessary by the nature of our problem. The results of 
the present paper can be extended in the usual manner to abstract spaces, but we 
forego this. It should also be remarked that in [6] Wald studied the present 
problem of estimating a structural parameter. 

The attitude towards the {a,}, ie., whether they are to be regarded as un- 
known constants or identically and independently distributed chance variables 
or something else, seems to vary with the author and sometimes even within the 


‘A paper entitled ‘‘The minimum distance method,’ which gives the details and proofs 
of the results announced in [4], is scheduled for publication in a forthcoming issue of these 
Annals. 
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publications of the same author. For example, Wald [2], in his treatment of the 
problem of fitting a straight line mentioned above, considers the {a,} as un- 
known constants; and Neyman and Scott, in their general formulation of the 
problem given in [1] and described at the beginning of the present section, also 
consider the {a,;} as unknown constants. On the other hand, Neyman in his treat- 
ment [7] of the straight line problem treats the a; as independently and identically 
distributed chance variables. Also Neyman and Scott [8] criticize Wald’s solution 
[2] on the ground that the conditions he postulates on the sequence of constants 
{a;} are such that they are unlikely to be satisfied when the {a;} are inde- 
pendently and identically distributed chance variables. Our own point of view 
and perhaps also that of the other writers cited, is that one need not insist on any 
one formulation to the exclusion of all others. There are certainly reasonable 
statistical problems where the {a;} may be looked upon as independently and 
identically distributed chance variables, and consequently the problem of the 
present paper is statistically meaningful and interesting. This is also the attitude 
implicit in [4] and [9]. 


2. Proof of consistency. As we have stated earlier, the essential idea of the 
proof comes from [5]. A compactification device has to be employed because the 
space I defined below may not be compact. 

We postulate that the following assumptions are fulfilled (see also the par- 
agraph preceding the lemma at the end of this section): 

ASSUMPTION 1: f(z | 6, aw) is a density with respect to a o-finite measure u on a 
Euclidean space of which z is the generic point. (This is also Wald’s Assumption 
1.) 

Let © be the space of possible values of 6, and let A be the space of values which 
a; can take. (Both 2 and A are measurable subsets of Euclidean spaces, f is 
jointly measurable in x and a@ for each @, and we hereafter denote by 
6° (1 S s S r) the components of a point 6, in 2 and by | a| the Euclidean 
distance from the origin of a point a ¢ A; 7 will denote Lebesgue measure on A.) 
Let T = {G} be a given space of (cumulative) distributions of a; . Let 4 , Go be, 
respectively, the ‘‘true’’ value of the parameter @ and the “true” distribution of 
a;. It is assumed that 6) «2 and Gp ¢ T. Let y = (0, G) be the generic point in 
2 X IT. We define 


(2.1) fiz\%) = ic | 6, 2) dG(z) 


and yo = (60, Go). In the space 2 K I we define the metric 


5(y1 ’ ¥2) — 5([6: ’ Gil, CZ ’ G.)) 


= >> | are tan 0” — are tan 65" 


s=] 


+ | | Gile) — Gale) | ™ dele). 
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Let 2 X T be the completed space of 2 X I (the space together with the limits 
of its Cauchy sequences in the sense of the metric (2.2)). Then 2 & T is compact. 
AssumPTION 2 (Continuity Assumption): It is possible to extend the defini- 
tion of f(z |) so that the range of y will be & & T and so that, for any {y;} 
and y* in @ & IT, ¥; — y* implies 
(2.3) f(z |v) > f(z | 7"), 
except perhaps on a set of x whose probability is 0 according to the probability 
density f(x | yo). (The exceptional z-set may depend on y*; f(z | y*) need not be 
a probability density function.) (This assumption corresponds to Wald’s con- 
tinuity Assumptions 3 and 5.) 
Assumption 3: For any y in 2 & Tf and any p > 0, w(z | y, p) is a measurable 
function of 2, where 


w(x |, p) = sup f(x |’), 


the supremum being taken over all y’ in & X I for which 4(y, vy’) < p. (This 
assumption is made for the reasons given by Wald. See his remarks following 
Assumption 8 in [5].) 

AssumpTIon 4 (Identifiability Assumption): If y; in @ X T is different from yo , 
then, for at least one y, 


(2.4) [ se\wane [ se\w de, 


the integral being over those z all of whose components are S the corresponding 
components of y. (This is the same as Wald’s Assumption 4.) 
Let X be a chance variable with density f(x | yo). The operator E will always 
denote expectation under yo ; Yo Will always, of course, be a member of 2 X I. 
Assumption 5 (Integrability Assumption): For any y in 2 X T[ we have, as 
p | 0, 


(2.5) lim E [toe ol aee) | < @ 
F(X | yo) 
(This assumption is implied by assumptions corresponding to Wald’s Assump- 
tions 2 and 6.) 
For any y in 2 X TI other than yo, define v = log f(X, y) — log f(X, vo). We 
begin the proof of consistency by showing that 
(2.6) Ev < 0. 


First, if y isin Q X T, 
that, for any y in® x T 


(2.7) Ev's Ee’ < 1. 


e’ <= 1. Hence from (2.3) and Fatou’s lemma it follows 


If v is —« with probability one according to f(z | yo), then (2.6) is obvious. 
Suppose therefore that » > —« with positive probability according to 
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f(x | yo). Then, by Jensen’s inequality and (2.7), 


(2.8) Ev < log Ee’ < 0, 


and the first equality sign can hold only if v is a constant ¢ with probability one 
according to f(x | yo). If the first equality sign does not hold (2.6) follows at once. 
Consider, therefore, the constant c. If c < 0 then (2.6) holds. If c > O then 
(2.8) is violated. We cannot have c = 0 because of Assumption 4. This proves 
(2.6). 

Now, as p | 0, for y’ ¥ yo, 


ae (X | 7, oy | f(X | y) | 
2.9 lim E} log 1) = BY] log het 
_ " [1 IX v0) °8 XT) 
by (2.3), (2.5), and Lebesgue’s dominated convergence theorem. Also, 
(X|ye FT _ py f(X |v) 
(2.10) lim E E = = —_— | = E| log-—— | , 
5 HX [10) 8 HX | 0) 


since the integrand of the left member decreases monotonically to the integrand 
of the right member. Hence, as p — 0, 


(X | y, p) f(X | y) 
(2.11) lim E E = >—— | = E log=—__ ~~ < 0 
5 HX [0) BAX | 10) 
by (2.6). Just as in [5] (see also [12]) it may then be shown that, for any positive 
p, there exists an h(p),0 < h (p) < 1, such that the probability is one that, for all 
n sufficiently large, 


. ) 
iT (Xi | y) | 

(2.12) sup { ——_—__—__ } < h", 
IT f(x: | v3 


the supremum being taken over all y in 2 < I for which 5(y, yo) > p, and where 
X,, X:2,--- are independent chance variables with the common density 
f(x | vo). 

Let L(m, +--+ ,2%n|¥) = II? fez; | y). A modified m.l. estimator is defined to be 
a sequence of u-measurable functions {4,} such that 


L(t, +++, tn|¥n(41,°** , In)) & ec sup, L(m, +--+, 2n|¥) 


for almost all (u) 2: , --+ , Zn for each n, where c is a positive number (the supre- 
mem is over 2 X I); for c = 1, this of course defines an m.]. estimator. (We shall 
not be concerned in this paper with conditions which ensure the existence of such 
measurable functions, although reasonable conditions are not difficult to formu- 
late.) We also define a neighborhood m.l. estimator to be a sequence of u-meas- 
urable functions {yz} such that there exists a sequence of positive numbers 
¢, With lim,.0€n = 0 for which supyen, L(21,--+ ,%n |v) = sup, L(t, --- , tn|¥) 
for almost all (u)z, --- , 2», Where II, is the set of all y in Q X T for which 
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5(y, yn(a1 » *** 5 4n)) < €,. (Thus, neighborhood m.]. estimators exist in some 
cases where m.]. and modified m.]. estimators do not; this will be useful in 
making clear certain examples below where the lack of consistency is not merely 
due, as it might seem, to the fact that no strict m.]. or modified m.]. estimator 
exists. ) 

The above result (2.12) implies the strong convergence of m.]., modified m.1., 
and neighborhood m.l. estimators (in the respective cases where they exist). 
The component of the estimator which estimates Go converges to it at all its 
points of continuity w.p.1. 

We remark that the above proof actually demonstrates consistency if, in the 
definition of m.]. estimator (or its variants), the supremum is taken over 2 X fT 
izstead of over 2 X T or, in fact, over any subset of 2 X IT containing yo . This 
last fact implies that if consistency is verified in an example where 2 = Q,, 
r = 1, then it automatically holds in the example where 2 = 2,, [T = re, 
whenever 2, C Q, and fT, C T,. In particular, this remark applies to the ex- 
amples of Sections 3, 4, and 5. 

We remark that Assumption 1 is not really essential in the above proof. Let 
P., denote the probability measure of X when y is the true parameter value, 
and let d(x, y, vo) = r(x, y, yo)/[1 — r(x, v, yo)], where r(x, y, yo) denotes a 
Radon-Nikodym derivative of P, with respect to P, + P,, at the point z. If, 
for each yo € 2 XK T, Assumptions 2 and 3 are satisfied when f(z | y) is replaced 
by d(x, y, Yo), if (2.4) is replaced by the condition that d(x, 7, yo) = 1 does not 
hold on a set of probability one under yo for any y, and if f(x | y)/f(zx | yo) is replaced 
by d(x, vy, yo) (with a similar replacement for w(x |, yo)) in Assumption 5 
and in the argument of the section, then (2.12) (with the replacement noted 
above) will still hold. An m.]. estimator 4 is now defined to be one for which 
sup, Il? d(X;, y, ¥) = 1 (with an analogous definition of modified and neigh- 
borhood m.]. estimator). We have not stated our assumptions and result (2.12) 
in this more general setting above because the stated form of the assumptions 
will suffice in most applications and will be easier to verify than assumptions 
stated in terms of d(x, y, yo) (which must be verified for each yo). As an example 
of the use of the more general result just cited, consider the problem of esti- 
mating the df F of a sequence of independent identically distributed discrete 
random variables, it being assumed that the true probability measure P» (cor- 
responding to the df F) satisfies 


> Pr(x) log Pex) > —&, 


the sum being over all points x for which P,(x) > 0. Then the assumptions 
are easily seen to be satisfied, and we may conclude that the sample df, which 
is the m.]. estimator, is a consistent estimator of F, a well-known result which 
does not usually seem to be considered as an example of the consistency of the 
m.]. estimator. (Of course, even if no restrictions of discreteness or logarithmic 
summability are placed on P, , the sample df is still consistent and, as pointed 
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out in the introduction, this is the m.|. estimator. However, Assumption 5 is 
not satisfied in this case.) 

Before proceeding to our examples in subsequent sections, we prove a simple 
lemma which will be useful later in verifying Assumption 5. 

Lema. If f(z: , --- , 2) is a bounded probability density function with respect 
to Lebesgue measure u on Euclidean k-space R*, and if 


(2.13) } Wo Le | du < (sish), 
z¢|>1 


then 
(2.14) — [ Fog fdup< om. 
gz 


Proor: If we prove that (2.13) implies (2.14) when f is replaced by cf in these 
equations, where c > 0, then the lemma is clearly proved. Thus, since f was 
assumed bounded, we may hereafter assume f < (2e)’. (The new f need not 
have integral unity.) Let 


k 
glzi,***, 2) = flzi, +++, z) + I] (23 +1) — 


i=l 


l 


Clearly, (2.13) is true with f replaced by g. Moreover, since g(z;, --- , zx) < e 
outside of a sufficiently large sphere about the origin, and since —f log f < —g: 
log g if 0 < f <g < e’’, it suffices to prove (2.14) with f replaced by g, assuming 
g bounded and (2.13) with f replaced by g. By (2.13), we have 


k 
2.16) [9 log IT (1 + 23)’ du < a. 


Thus, it suffices to prove the finiteness of 
k 
- / g log g du — | g log [I (1 + 23)! du 
re Re inl 
~ 
= foto TL a + 2'{=e TTC +a 4, 


i=l log II (1+ ) 


The fact that g(z:, --- , 2.) = [](e? + 1)* (see (2.15)) implies easily that the 
bracketed expression in (2.17) is S 1; by (2.16), this completes the proof of the 
lemma. 


(2.17) 


3. Example 1. Structural location parameter, incidental scale parameter. 
Let k be a positive integer, let ~ be Lebesgue measure on Euclidean k-space, let 
g be a univariate probability density function with respect to Lebesgue measure, 
and let 


lr ri; — 80 
(3.1) fiz: |0,«:) = Tg (= ), 


As; j=l a; 
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where x; = (au, --- , 2). (Thus, observations are taken in groups of k 2 1, 
the value of the incidental parameter being the same within each group. The 
(unconditional) density of X; = (Xa, ---, Xw) is given by Equation (2.1). 
Thus, the X; are independent, but, for fixed i, the X;;(j = 1, --- , k) need not 
be independent.) Here © is the real line. Some further assumptions on g will be 
made below; we remark here that the important case 


z2/2 


(3.2) g(x) = (24) te @"” 
will satisfy our assumptions. (See also (3.4) below.) 

The cases k = 1 and k > 1 are essentially different. In Example la the con- 
sistency of the m.|. estimator will be proved for k = 1 assuming that A is the 
set of values a 2 c where c is a known positive constant, and it is pointed out 
that the property of consistency of the m.|. estimator does not hold without this 
assumption. The proof of consistency in Example la is actually carried out for 
k = 1 since this requires little additional space and will save space in Example 1b 
where we may refer back to la for proofs. In Example 1b we prove consistency 
of the m.]. estimator in the case k > 1 without assuming a 2 c > 0. 

Example 1a. We assume that k 2 1 and that A is the set of all real values 
a 2 c where c is a known positive constant. In the case k = 1, this assumption 
on A can be weakened slightly to an assumption on the behavior of G(a) as 
a — 0; however, some such assumption is necessary for consistency, since the 
last example of Section 6 shows that, even in cases where T is restricted to a 
simple parametric class of df’s on a set of positive reals which is not bounded 
away from zero, it can happen that no m.l. or modified m.|. estimator exists 
and that there are neighborhood m.]. estimators which are not consistent. 

We now state our assumptions on g and Gy. They seem reasonable and are in 
a form which makes brief proofs possible; they undoubtedly can be weakened. 
(These last remarks apply also to Examples 2 and 3. See also the first part of 
Section 6 for one method by which we can prove the results of our examples 
under weaker conditions.) We hereafter assume 


(a) sup. g(r) < «3 
(b) g is lower semicontinuous and for every « > 0 there is a continuous 
function h, = g for which f{h(x) — g(x)| dz < €; 


(c) lim g(x) = 0; 


|zl|+2 


d) -f g(z)(log | x ||" dr > —@; 


(e) | |x !“g(x) de ~ 0 for almost all real ¢; 


(f) g(x) > O for almost all x in some open interval whose closure con- 
tains the point x = 0. 
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We note that, in addition to being satisfied in the case (3.2), Assumption (3.3) 
is also satisfied in such important cases as 
(a) g(x) = 1/x(1 + 2’); 
(3.4) (b) g(x) = 1 if |z| < 4 and g(x) = 0 otherwise; 
(c) g(x) = e * if zx > O and g(x) = 0 otherwise. 


Of course, if g does not satisfy (3.3) but if there is a function g* satisfying (3.3) 
and for which g(x) = g*(x) almost everywhere, then we may carry out our 
considerations replacing g by g*. 

We assume that I consists of all G such that 


(3.5) [dog a) aaa) < «, 


where c is the constant used before in the definition of A. For example, G be- 
longs to T if, for some positive constants b and e, 


b 
l+e 


(3.6) 1 ~ Ha) < je allogiog a)™ 


for a > e°; integration by parts will verify that (3.6) implies (3.5). Condition 
(3.5) is weaker than the requirement that any positive (not necessarily integral) 
movement of G be finite. 

We now verify the assumptions of Section 2. We complete the definition of f 
for (0, a) in & X A by setting f(x | 0, a) = 0 whenever @ = +” ora = &. 
For (6, G) « & X T, we then define f(z | 6, G@) by (2.1). (We remark that TF ob- 
viously contains all df’s on A.) Assumption 1 is obviously satisfied. Assumption 
3 follows from the fact that (3.3) implies that f(x | 6, G) is for each x lower semi- 
continuous in (6, G) (in the sense of the metric 6) on 2 X T, and the fact that 
& xX F is separable. Write h.(x;| 0, a) = a] [5h [(2i; — 0)/a]. In order to 
verify Assumption 2, we note that, by the lower semicontinuity in (@, @) of 
f(x | 0, G) and by the Helly-Bray theorem, we have (assuming, as we may, 
that the h, of (3.3) (b) satisfies lim).)., h.(z) = 0) that (6;, Gi) — (6*, G*) 
as i— © implies 


i+ i+ 


f(x | 6*, G*) S lim inf | 1 | 0;,a) dG; S lim sup | f(x | 6; , a) dG; 
(3.7) 


Slim [ hile] 6,0) dGs = [hx | 6*, 2) dG*. 
Since the last member of (3.7) is greater than or equal to the first for all z and 
since their difference has integral < ¢’ (with respect to «), Assumption 2 fol- 
lows at once. 

In verifying Assumption 4, it clearly suffices to prove that, if f(z | 0. Go) = 
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f(x | 0, , G:) for almost all z, where (6; , G;) 2 & T fori = 0, 1, then (@) , Go) = 
6, , G,). If an interval 0 < az < € satisfies (3.3) (f), there is a value 8 such that 


P{Xi; S tforl <j Sk|%,G} <8 


is satisfied (whatever be Go) if and only if ¢ S 6, a similar assertion holding 
if the interval —e < x < 0 satisfies (3.3) (f). Hence, it suffices to prove the above 
assertion when 6) = 6, since it cannot hold when 6) ~ 6,. Let H; be the df of 
the random variable log a; when G; is the df of the random variable a; ; i.e., 
Ht) = Ge’). Then, putting g*(z) = e‘[g(e’) + g(—e’)] (g* is the density of 
log |U| when g is the density of U), it suffices to prove that, if Ho and H, are 
not identical, then p;(2z: , --+ , 2) and po(z, +--+ , 2) are not identical for almost 
all (z,, --- , Ze), Where 


(3.8) Pilar, *** » 2) = F. I] g*(z; — 8) dH;(8). 


Let g** be the density function of > Z;/k when the Z; are independent random 
variables with common density g*. The above assertion is then implied by the 
assertion that the function 


(3.9) air) = | o**(r — 8) aH) 


uniquely determines the df H. But if A, B, C are the characteristic functions 
of q, g**, H, respectively, then B(t) ¥ 0 for almost all ¢ by (3.3) (e) and hence 
C(t) is determined for those ¢ for which B(t) ~ 0 by C(t) = A(t)/B(t) and else- 
where by continuity. Thus, Assumption 4 is verified. 

It remains to verify Assumption 5. Since f(z | 6, @) is uniformly bounded in 
x, 6, G, Assumption 5 will clearly be satisfied if 
(3.10) E log f(X,1 | %, G) > —@. 


Since the left side of (3.10) does not depend on #, we may assume 6) = 0. 
By (3.3) (d) and (3.5), we have 


7 , + 7 g % =n a r 
Eflog | Xu | ] = £ ae log ay 
(3.11) 


<E [tog Xa Xu 'y + Eflog a]* < « 


equation (3.10) is a consequence of (3.11) and the lemma at the end of Section 2. 
This completes our verification of the fact that the assumptions of Section 2 
are implied by (3.3) and (3.5). 
Example 1b. We now assume k > 1. A is the set of all positive a, while T 
is the set of all df’s G on A satisfying 


(3.12) I log a | dG(a) < @. 
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We assume that g satisfies (3.3) (some alterations could be made here but, 
for the sake of brevity, we forego making them) and also that 


(a) lim zg(x) = 0; 


|z|+eo 


(3.13) k 
(b) sup [min | x, — 1; |] [] g(a) < &. 
z r< j=l 
Assumption (3.13) is easily verified, for example, in cases (3.2) and (3.4). 

We now verify the assumptions of Section 2. We define f(x | 6, a) = 0 when- 
ever? = +2 ora = Oor ~;/f(zx | 6,G) is then defined by (2.1) for (6, G) e 2 X P. 
Assumptions 1, 3, and 4 are verified exactly as in Example la. In verifying As- 
sumption 2, we may follow the demonstration of Example la, noting only that 
the h, of (3.3) (b) may (because of (3.13) (a)) clearly be assumed to satisfy 
lim)z)+,, th.(x) = 0, so that for every x none of whose components is 4, 


(3.14) lim h.(x | 0; , a) = 0; 


par 
thus, for almost all (u) x, the Helly-Bray theorem may still be used at the last 
step of (3.7), no difficulty being caused by the possibility that lim inf;., G;(0) < 
G*(0). 

It remains to verify Assumption 5. Now, f(x | 6, G) is no longer uniformly 
bounded as it was in Example la. However, by (3.13) (b), there is a constant 
B such that, for all 2; = (a, --- , 2%) none of whose components are equal, 
every 6 ¢ Q, and every a ¢ A, 


f(a1 | 6, a) = [min | 7, — x, | oe { [min | yr — Yas }* I] g(ys)? 
(3.15) res L rcs jal ) 
= B[min | Lir —~ Vie | 7 


' 


r<e 


where 41, = (21, — 6)/a. Hence, for almost all 2; , 


sup log f(z: | 6,a) < log B + k max log [1/ | m1, — x1 |] 
6c 


r<s 
aea 


(3.16) 
< log B+ k >. [log (1/| a1, — ax | )J*. 


r<e 


Now, by (3.3) (a), there is a value B’ such that g(z) < B’ for all z. Hence, by 
(3.12), B, denoting a finite constant, we have 


Eflog(1/ | Xu = Xe | )}* < Eflog 1 a,)* a Eflog (ay Xn -— Xv i 
@ Z_+1 
(3.17) = B,-2 | q(z2) / B’ log (21 — 22) dz; dze 


= B, + 2B’ < =. 
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From (3.16) and (3.17), we obtain 


(3.18) E sup log f(X1| y) < @. 
yeaxft 


Assumption 5 is a consequence of (3.18) and of (3.10), the latter of which is 
proved exactly as in Example la. This completes the verification of the assump- 
tions of Section 2 in Example Ib. 

The discrete analogue of Example 1 can be carried out similarly by letting z, 
6, a take on only rational values; this is, however, of less practical importance. 
The multivariate extension of Example 1 (X,; a vector) may also be carried 
out similarly. 


4. Example 2. The straight line with both variables subject to error. 

In this section we shall treat the case k = 1 of fitting a straight line with 
both variables subject to normal error, a famous problem with a long history. 

We consider a system {(X, X2)},7 = 1,2,---, of independent chance 2-vectors 
(the two components X,, , Xi. need not be independent for fixed 7). We have 
6 = (0, , 6), 2 the entire plane, 6) = (610 , 620), A the entire line. I’ is the totality 
of all non-normal (univariate) distributions G (a chance variable which is con- 
stant with probability one is to be considered normally distributed with variance 
zero) which satisfy 


J Gog | « |)* aG(a) < «. 
It is known to the statistician that 
Xa as + Uj; , 
Xi2 = O10 + Boga: + Vi, 


where (u;, v;) are jointly normally distributed chance variables with means 
zero, each pair (u,, v;) distributed independently of every other pair and of the 
independent chance variables {a;}, with a common covariance matrix which is 
unknown to the statistician. 

It is known (see [10]) that the distribution of (Xa, X) then determines 6, 
uniquely, but in general not Go, the “true” df of a;, or the “true” covariance 
matrix 


‘di dis| 
\diz dos) 


of (u; , v;). However, a ‘‘canonical’’ complex is determined. (See [4].) 

Complete the spaces 2, A, and T to obtain 9, A and f. The space Ff contains 
all normal distributions on A, but this will cause us no trouble in estimating 4 , 
as we shall soon see. 
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Let D be the space of all triples (dy, , dy , dx) such that 
dy = Au > 0, dx = rx» > 0, 
diy doy —_ d?, = Aw > 0, 


where Au, Az, Az, are given positive numbers. (This will be discussed further 
below.) We define a metric in D in the same way that one is defined on Q. Let 
D be the completed space. We shall assume that the “true” triple di, diz, 
dss is in D. 

The place of 2 X T in Section 2 and in Example 1 will now be taken by 
& x T x D. We therefore define 


y= (A; ’ 02 : G, dy . dis > do) 


as the generic point in 2 X TI x D. 
Let f (ri , 22 | 01 , 02 , a, du, diz , dee) be the joint density function of (Xa , Xj») 
when @ = (6,, 62), a; = a, and the covariance matrix of (u;, ;) is 
{du dis} 
\diz de} 
(u is Lebesgue measure in the plane). If, in the above, @ is in 2 — Q or a@ is in 


A —A or (dy, dy, dx) is in D — D, we define f to be zero. Finally we define 


f(x1 , x2 | y) = [Sle 201 01, ya dn yds » diz) dG(a). 
A 


It is known ({10] and [4]) that all y in the same canonical class, and only such, 
define the same f(x , x2 | y) (of course, to within a set of u-measure zero). Two 
members of the same canonical class have the same 6 = (6,, 6) but different 
G’s and d;;’s. We shall estimate only %. For an estimator of the entire ca- 
nonical complex by the minimum distance method under necessary assump- 
tions only, see [4]. In Section 5 below will be found an explanation of why 
the entire canonical complex cannot be estimated by the m.l. method. 

From the definition of f(z; , x2 |) it follows immediately that Assumptions 
1, 2, and 3 of Section 2 are satisfied. Since we are estimating only 4 , it is suffi- 
cient to verify Assumption 4 only for 6 and 6* # 4, i.e., if we write the yo 
and 7; of (2.4) as 


Te = (610 ,» 929, Go, dh ’ d}. ’ d>), 
_ (OF ’ 62 ’ G, ’ dy ’ dy. , dx), 


only 6 = (810, 9) has to be different from the corresponding 6* = (6; , 02). 
Now we know that G» is in I, hence is not normal and assigns probability one 
to A. If G; is alsc in T then Assumption 4 follows at once from the results of 


5 See footnote 4. 
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Reiers¢gl [10] or from [11]. If G, assigns probability less than one to A, f(x | y:) 
assigns probability less than one to the Euclidean plane of (2, 22). If G; is 
normal and assigns probability one to A, then (Xa, Xj) are jointly normal 
under +; , but not under yo. Thus Assumption 4 is always satisfied. 

To verify Assumption 5 we proceed essentially as in Example 1, and use 
the lemma at the end of Section 2. Assumption 5 is satisfied if 


E log f(Xa,Xa\vo) > —®. 
By the lemma this will follow if we prove 
Eflog |Xi;|}* < 
forj = 1, 2. Now 
Eflog |Xa\}* S Eflog[|Xa — a) + lai }}* 
E {log [ |Xa — | + 1)} + Ef{log |e,|}* 
Eflog [ |us| + 1]} + Eflog lad}* 
oO, 


Similarly, 


E{log |Xa\}* S Ef{log[|Xa — 010 — O200i| + |O10 + O20c| ]}* 


E log [ |Xiz — 610 — 200] + 1] + E flog |@10 + P2ocxi|}* 
E log [ \vs| + 1] + {log |@0]}* + E log [1 + |@20c| ] 


o, 


Thus we have shown, under our assumptions on I and D, that Assumptions 
1 through 5 of Section 2 are satisfied, so that the m.1. estimator of 6) converges 
strongly to asn — ~. 

The assumption on P (that dy, , dx , and dy dx — dy’ are bounded away from 
zero) cannot be entirely dispensed with. For if D consists of all triples for which 
dy, , de , and dy de — dy’ are positive, if S, is the sample df of tu, --- , 2m, 
and if 4, is the complex (0, 0, S,, ¢, 0, >-? x2), then it is easily verified that 
limeso L((an, 22), --* , (Xnr, Ln2)|%e) = ©; thus, no m.]. or modified m.]. 
estimator exists, and there are neighborhood m.]. estimators which are not 
consistent (for 6). 

The case k > 1 is much simpler to treat than the above case. It is easy to 
see that then the covariance matrix of (u;, v;) is uniquely determined, and 
from this it follows easily that the whole complex 7 is uniquely determined. The 
problem can be treated in a manner similar to that of Examples 1b and 3b. 

The problem of this section with the distribution of (u; , v;) other than normal 
may also be treated by the m.]. method, as in Examples 1 and 3. The last para- 
graph of Section 3 applies also to the present example. 
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5. Example 3. Structural scale parameter, incidental location parameter. 
We consider here the case of a structural scale parameter and an incidental 
location parameter; this reverses the roles of the two parameters of Example 1. 
Thus, we suppose » to be Lebesgue measure on R* and 


5 an, . an Vij —- a 
(5.1) fies |,a) = 2 IL 0 (=). 


The cases k = 1 and k > 1 are essentially different, and we consider them 
separately. 

Example 3a. The case k = 1. This example is another simple one where no m.]. 
estimator is consistent, and also shows, in a simpler setting, why in Example 2 
the m.l. method was incapable of estimating the components of the canonical 
complex other than @. Since Example 3a is intended to illustrate the failure 
of the m.]. method in certain situations, we shall for simplicity assume that g 
is given by (3.2); examples with other g (e.g., (3.4)) may be treated similarly. 
@ may be taken to be any specified set of positive numbers containing more 
than one point; for the sake of brevity, we assume that 2 contains its greatest 
lower bound ¢c (say) (and thus, that c > 0), but it is easy to carry through a 
similar demonstration (with modified or neighborhood m.]. estimators in place 
of m.]. estimators) when c zg Q. T is taken to be the class of all df’s G on the 
real line for which f{log ja||* dG(a) < and such that G has no normal com- 
ponent; i.e., no G in T can be represented as the convolution of two df’s, one of 
which is normal with positive variance. (T may be further restricted, e.g., by 
the condition that for each G there is a bounded set outside of which G has no 
variation.) 

All assumptions of Section 2 are easily verified except Assumption 4; there is 
no difficulty of identifiability in @ X T, but there clearly 7s in © & TI. Consider 
now the expression 


n 2 2 i 
Bi ao ate. 


t=! 


It is clear that the maximum of (5.2) with respect to M can be achieved only by 
an M which assigns probability one to the interval (min (7, --- , Z,), Max 
(x,, -**, 2n)) and hence which has no normal component. This discussion of 
the expression (5.2) shows that, for every n, any m.|. estimator (the fact that 
the maximum is attained is easily verified) of (@, G) subject to our assumption 
6 = c always estimates @ to be c. Thus, no m.|. estimator of (@, G) is consistent 
(unless 6 = c). 

To summarize the result of this example, then, the m.]. method is incapable 
of estimating consistently the normal component of the df of the sequence 
{X,} of independent identically distributed random variables because, in every 
neighborhood of a point (@, G) with @ > c, there are points with @ = ¢ (and 
for which the likelihood is larger). 





MAXIMUM LIKELIHOOD ESTIMATOR 903 


Let N, denote the normal df with mean 0 and variance o’, and let H; * H, 
denote the convolution of the two df’s H; and H2. 

It is interesting to note that, without any assumption on I’ (except the neces- 
sary identifiability assumption that Gy has no normal component), the minimum 
distance method is capable of estimating (@ , Go) consistently [4]. The difficulty 
noted above for the m.]. estimator is avoided by noting the rate at which the 
sample df S, converges to the df Ne, * Go of X; and estimating 6 not by the 
value t for which N, * H is closest to S, for some normal-free H (this would 
encounter the same difficulty as the m.]. estimator, since, the smaller ¢ is taken, 
the closer can N, *« H be made to approximate S,), but as the largest value 
for which there is an N, *« H suitably close to S, (“‘suitably” is connected with 
the rate mentioned above.) 

One could modify the example as considered above so as not to require (5 
to have no normal component, and try then to escape the difficulty of non- 
identifiability by asking for an estimator of the canonical representation of 
(6, G), this representation consisting of two df’s, the normal and nonnormal 
components of N, * G. The previous demonstration then shows that no m.]. 
estimator of the canonical representation estimates it consistently, and thus 
illustrates, in a simpler setting than that of Example 2 with k = 1, why the m.1. 
estimator could not be used in Example 2 to estimate the components of the 
canonical complex other than @. 

We remark that it is easy in many cases such as that of the present example 
to prove a result such as the one that, (¢, , H,) denoting an m.|. estimator of 
(4, Go) after n observations, the df N,, * H, converges w.p.1 to No, * Go as 
n — «©. Such a property is much weaker than that of the consistency of the 
m.|. estimator, and does not lie much deeper than the Glivenko-Cantelli theorem. 

Example 3b. The case k > 1. We assume f to be given by (5.1) with k > 1. 
The function g is assumed to satisfy the conditions (a), (b), (c), and (d) of 
(3.3); conditions (a) and (b) of (3.13), and 


(5.3) | e*q(x) dx # 0 for almost all real ¢. 


(As in Example la, weaker conditions could be assumed here if we assumed 
also @ = c > O; the above conditions are analogous to those of Example 1b.) 
Thus, for example, (3.2) and (3.4) satisfy these assumptions. 2 is the set of all 
values 6 > 0, while A is the real line and T is the set of all df’s G on A for which 


(5.4) llog | a | }” dG(a) < =. 


We now verify the assumptions of Section 2. We define f(z | 6, a) = 0 when 
6 = Oor © ora = +~&. The definition of f(x | 6, G) for (0, G@) «2 X T is then 
given by (2.1). Assumptions 1, 2, and 3 are now verified as in Example 1b, inter- 
changing the roles of 6 and a in the latter (including the definition of h,(x | 0, a)) 
and noting that (3.14)) still holds for almost all (u) z, with this interchange. In 
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order to verify Assumption 4, we note, for (@é, G) ¢Q X T, that 6 is determined 
by the density function of X,, — Xj. and that, for almost all real ¢, the char- 
acteristic function of G is then given by B(t/k, --- , t/k)/(C(6t/k)\* where 
Bit, «++, t&) is the characteristic function of Xn, --- , Xu and C(t) is the 
characteristic function of g. 

Finally, Assumption 5 is a consequence of equation (3.18), which is proved 
in the present case exactly as in Example 1b (using (3.15), (3.16), and (3.17), 
with a, replaced by @ in the latter), and of equation (3.10) (with f defined by 
(5.1)). Equation (3.10) in the present example is a consequence of the lemma 
at the end of Section 2 and of 
5.5) E flog |Xu\}* < Eflog[{|Xu — a} + jay! }}* 
dW 

S E log [|Xu — am| + 1) + Eflog |a|}~ < o. 


This completes the verification of the assumptions of Section 2 in Example 3b. 
The last paragraph of Section 3 applies also to the present example. 


6. The Classical case. Miscellaneous remarks. It does not seem to have been 
noticed in the literature that a simple device exists for proving consistency of the 
m.l. estimator in certain cases where the regularity conditions of published 
proofs fail. This device may be used in the case studied in the present paper (to 
prove consistency in the examples under weaker conditions than those stated) as 
well as in the classical parametric case. We now illustrate this device in an ex- 
ample of the latter case. 

When T consists only of distributions which give probability one to a single 
point, the problem of the present paper becomes the classical problem of esti- 
mating the parameter @ and the parameter o (say) to which Gp gives probability 
one. If @ may be any real value and o any positive value, then the function 
(1/0)g((x — 6)/c) of Section 3 does not satisfy Wald’s integrability condition 
or the corresponding condition of any other published proof; one verifies easily 
that (2.5) is not satisfied for any point in the (@, o) half-plane which lies on 
the line ¢ = 0. (The line ¢ = 0 has to be added to © in the process of forming 
©. As in earlier sections, we assume the true oo to be >0.) Often, however, when 
the observations are considered as if they were taken in groups of two or more, 
the integrability condition will be satisfied. Such is the case, for example, with 
the density function 


Co 
ro? + (xm, — 6)? «x o? + (2% — 6)? 
and the normal density function 
1 l(m—96)*\ 1 1 (x2 — 6)? 

a OXP{ — = oo XPS — 

(29)*e 2° o Jf (2n)!e 2 ¢€ 
(Of course the estimator from the normal distribution is known to be con- 
sistent, but this does not alter the validity of the example.) In such cases it 
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follows from Wald’s proof [5] (using the compactification device used above) or 
from the result of Example 1b that the m.]. sequence of estimators considered 
only after an even number of observations in consistent, and from this it is an 
easy matter to show that the entire m.]. sequence of estimators is consistent. 
We shall now discuss the integrability conditions of [5] and of the present 
paper. The integrability condition (2.5) involves the difference of two logarithms; 
the integrability condition as given by Wald in [5] requires the finiteness of the 
expected value of each logarithm. The form (2.5) is satisfied whenever the 
condition of [5] is, and has one other advantage which we shall now illustrate 
by an example. Let the observed chance variable X have density function 
6e* for x > O and zero elsewhere. The parameter @ is unknown and @ is the 
positive half-line, so that © contains the point 6 = 0. One verifies easily that the 
condition of [5], and hence (2.5), are satisfied. Suppose now that, instead of 
: a (eX) : ; : 
observing X, one observes Y = e° ’, which therefore has the density function 
6—1 


8 if 
— (log x) 
2 

for x > e, and zero elsewhere. One readily verifies that, when 6 < 1, 


E log 17 (log yt = —o 


/ 


, 


so that the condition of [5] is not satisfied when 0 < 6) < 1. Thus, whether the 
condition of [5] is satisfied depends in this instance on whether one observes X 
or Y; this is an unfortunate circumstance, since the estimation problems are in 
simple correspondence. On the other hand, condition (2.5) is invariant under 
one-to-one transformation of the observed chance variable because the numer- 
ator and denominator of the ratio in (2.5) are multiplied by the same Jacobian. 
(In particular, therefore, the chance variable Y satisfies (2.5).) 

Without resorting to artificial or pathologic examples as is sometimes done 
in the literature, it is still easy to give instances where the m.]. method does not 
give consistent estimators in the classical parametric case. For example, consider 
the density function 


exp {—3(x — 6)°} + i ep{—3 (x — 6)"\ 


1 

3n)) 3Or)io se | 

of the sequence of independent and identically distributed chance variables 
X,, X2, --: . Here @ and o are the unknown parameters, 6 may be any real 
number and ¢ any positive number. It is easy to see that the supremum of the 
likelihood function is almost always infinite, no m.]. or modified m.]. estimator 
exists, and there are neighborhood m.]. estimators (where 4 is estimated by 
X,, say) which are obviously not consistent. 
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AVERAGE VALUES OF MEAN SQUARES IN FACTORIALS' 
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1. Summary. The assumptions appropriate to the application of analysis of 
variance to specific examples, and the effects of these assumptions on the result- 
ing interpretations, are today a matter of very active discussion. Formulas for 
average values of mean squares play a central role in this problem, as do assump- 
tions about interactions. This paper presents formulas for crossed (and, inci- 
dentally, for nested and for non-interacting completely randomized) classifications, 
based on a model of sufficient generality and flexibility that the necessary as- 
sumptions concern only the selection of the levels of the factors and not the 
behavior of what is being experimented upon. (This means, in particular, that 
the average response is an arbitrary function of the factors.) These formulas are 
not very complex, and specialize to the classical results for crossed and nested 
classifications, when appropriate restrictions are made. 

Complete randomization is only discussed for the elementary case of ‘no 
interactions with experimental units” and randomized blocks are not discussed. 
In discussion and proof, we give most space to the two-way classification with 
replication, basing our direct proof more closely on the proof independently ob- 
tained by Cornfield [17], than on the earlier proof by Tukey [20]. We also treat 
the three-way classification in detail. Results for the general factorial are also 
stated and proved. 

The relation of this paper to other recent work, published and unpublished, is 
discussed in Section 4 (average values of mean squares) and in Section 11 (various 
types of linear models). 


[INITIAL DIscusSION 


2. Introduction. During the last years of the last decade it was relatively easy 
to believe that the analysis of variance was well understood. Eisenhart’s summary 
article of 1947 [5], when combined with the work of Pitman [13] and Welch [15) 
on the randomization approach (work published in 1937-1938, which ever since 
has been far too much neglected), seemed to provide a simple, easily understand- 
able account of the foundations. But as the years have passed, both statisticians 
and users of analysis of variance have gradually become aware of a number of 
areas in which we needed to deepen our understanding. One of these is the rela- 
tion of formulas for average values of mean squares to assumptions. These are 
of central importance, since the choice of an “error term’’ as a basis for either 

teceived May 31, 1955; revised May 24, 1956. 
| Based in part on the Statistical Research Group, Princeton University, Memorandum 


Report 18, ‘‘Interaction in a row-by-column design,’ prepzred in connection with research 
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significance tests or confidence statements must at least take into account average 
values of mean squares. (It would be very desirable to consider, also, variances, 
covariances, and distributions of mean squares, particularly if one is concerned 
with the detailed validity of an F-test or a multiple-comparisons procedure. Aver- 
age values, however, seem absolutely essential.) These average values should 
apply when there are real effects and should not be confined to so called “null’’ 
hypotheses. To derive them we must make assumptions, but they should be as 
weak as possible. 

The present paper deals with one aspect of the problem of average mean 
squares. After introducing this aspect, and describing the situation which we are 
to treat, we shall relate both this aspect to other aspects of current interest and 
our work to the work of others. 

Difficulties in applying the analysis of variance to even a two-way model have 
arisen in the so-called mixed model where rows are sampled and columns are 
completely enumerated. Of two recent standard textbooks, one concludes that 
the average value of both row and column mean squares includes a component 
of variance due to interaction [11]; the other finds such a component in column 
sum of squares, but not in rows [1]. Both texts assume that observed values are 
linear combinations of certain fixed and random variables, but differ in the na- 
ture of the restrictions that are imposed upon these variables. Granting either 
text its assumptions, its conclusions necessarily follow. 

In analyzing the given body of data, the choice among such assumptions can 
lead to quite different answers. The choice is thus an important one. But it is not 
a simple choice. At one stage in the development of his ideas about two-way 
classifications, one of the authors, who leans toward linear models, had 512 
alternative sets of assumptions. If they had led to 512 sorts of analysis, the situa- 
tion would obviously be quite impractical. The only effective way out of such a 
difficulty was to obtain a single flexible model which could be specialized to any 
one of the 512 special possibilities. This was done in [20] (and independently in 
[17] and in [21}). 

The question of what assumptions to make seems, at first glance, to be a 
purely empirical question, one that should be referred to the subject-matter 
knowledge of the experimenter, who is the expert on such matters. Sometimes 
this is helpful and sometimes not. But closer study shows that the choice of as- 
sumptions depends on more than empirical questions about the behavior of the 
experimental material. It depends on the nature of the sampling and randomiza- 
tion involved in obtaining the data (as has been recognized by many statisti- 
cians, and recently emphasized by Kempthorne and by Wilk). Moreover, it often 
depends on the purpose of the analysis, as expressed by the situations or popula- 
tions to which one wishes to make statistical inference. These dependences imply 
diversity, and adequate treatment of diversity requires flexibility of as- 
sumption. 

After a general initial discussion (Sections 2-6), various descriptions of the 
situations treated are given (Sections 7-12). The results (Sections 13-19) come 
next, and are followed by the proofs (Sections 20—28). 
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3. A pigeonhole model. These requirements of flexibility are met by the system 
of assumptions which we are about to describe, which specify an object appro- 
priately called a pigeonhole model. Given the values of each of the factors 
(variables), we are directed to a pigeonhole containing a finite or infinite popula- 
tion. This population represents the possible results of experimentation with these 
values of the factors. So far we have made no assumptions about how the factors 
combine to produce the typical effect of each combination. (This typical effect 
will often be taken as the arithmetic mean of the population of possible values in 
the corresponding pigeonhole.) We have made the assumption that we can recog- 
nize the different levels of each factor, and we shall shortly make assumptions as 
to how we sample this array of pigeonholes. Neither of these is an assumption 
about “how the world behaves’’; both are assumptions about the experimenter’s 
behavior (and are consequently much easier to check). 

We have emphasized the generality of our assumptions; we must also emphasize 
their limitations. We are concerned with situations where the variability of ex- 
perimental units (plots, reactors, epochs, mice, etc.) does not play such a pre- 
dominant role as to require special attention in the design of the experiment. 
We shall not attempt to treat such weil-known situations as randomized blocks, 
which have been recently studied by Kempthorne [8] and Wilk [16]. (See the 
next section for a discussion of mutual relations and distinctions.) 

Let us be specific about the case of the two-way classification. Let there be 
RC pigeonholes arranged in R rows and C columns. Let there be at least n ele- 
ments in the population in each pigeonhole. Let a sample of r rows be drawn 
from the R potential rows. Let a sample of c columns be drawn from the C 
potential columns. The rc intersections of a selected row with a selected column 
specify the rc pigeonholes which become the cells of the actual experiment. In 
each of these rc cells, let a sample of n elements be drawn. The values of the 
rcn elements thus obtained are the numbers which are to be analyzed. Assume 
that all the samplings—of rows, of columns, and within pigeonholes—are at random 
and independent of one another. This is the only assumption we shall make. Note 
that it is an assumption about the set-up of the experiment and not about the 
behavior of those things on which the experiment is performed. 

While the generality of this model is quite apparent, its flexibility may not be 
completely evident. If we choose R = r and C = c, then rows and columns be- 
come fixed, and we have a fixed model which generalizes Model I of Eisenhart 
[5], since (i) neither normality or constant variance is assumed for the cells, and 
(ii) no assumption is made about interactions. 

If, at the other extreme, we take R and C both infinite, we have a “random” 
model which generalizes Model II of Eisenhart, since (i) neither normality or 
constant variance is assumed for the cells, (ii) normality is not assumed for the 
row and column populations, and (iii) no assumption is made about interactions. 

If we take C = cand R infinite, then we obtain a generalization of the conven- 
tional mixed model. 

Thus our model is flexible enough to cover all the classical cases, and many 
others besides. 
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Our model can be described from two apparently quite different points of 
view, namely, as an example of urn sampling or in terms of a very general linear 
model. (By urn sampling we shall mean what is usually referred to as sampling 
without replacements from a finite population.) We shall give both descriptions, 
striving to take as different points of view as we can about them. An understand- 
ing of either description should suffice as a basis for the interpretation of the 
results. 


4. Relation to other problems and workers. Historically, the understanding 
of the average mean squares, and of the formulas relating them to the underlying 
models, has developed separately for the different relations which classifications 
have to one another in the customary situations. The order of development has 
usually been the same. First, average mean squares are obtained under simplify- 
ing assumptions, and then new formulas are obtained as these assumptions are 
relaxed, usually successively. The initial assumptions, whose removal! is often 
very important, are typically of the following sorts: 

(1) some classification does not interact with one or more others; 
(2) the “errors” are solely (or practically solely) due to the experimental 
units used; 
(3a) the levels of a certain variable are fixed; 
(3b) the levels of a certain variable are a random sample from an infinite 
population. 
When we remove these assumptions, we remove (1) or (2) entirely, and replace 
(3a) or (3b) by 
(3) the levels of a certain variable are a random sample from a population 
of arbitrary size. 
We shall describe assumption (2) as “no free errors” and assumptions (3a) and 
(3b) as “fixed—” and “infinite sampled—’”’ respectively. 

The foremost distinction between the different relations between classifica- 
tions is the minimum number of classifications which can enter this relation. 
Thus two (or more) classifications can be crossed or nested. Three (or more) 
classifications are involved in randomized blocks, in a simple fractional factorial, 
or in a Latin square. At least four classifications appear in a lattice, and so on. 

First, let us discuss the relations which need involve only two classifications: 

(1) Nested (or as some say, hierarchical) classifications. Here the general 
situation was clarified first, and general formulas have been available for some 
time. (The early clarification of this case may have been due to its intimate rela- 
tion to the precision of multistage sampling.) 

(2) Crossed classifications. The case of random interactions was treated early 
(cf. [5], [8], ete.), and in varying degrees of generality, the most general being in 
[14]. 

For the important case of arbitrary interactions, we know of nothing which 
antedates Memorandum Report 18 [20], which stated a general rule for crossed 
and nested cases. (However, informed understanding of the limiting cases of 
“fixed” and “infinitely sampled” cases seems to have been present among some 
users before this time.) The rule for the limiting cases was stated by Kempthorne 
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((8], p. 574). The general results for the two-way classification were obtained by 
Cornfield [17] and by Wilk [21], independently of each other and of Memoran- 
dum Report 18. All three approaches to this result were from somewhat different 
points of view. Kempthorne ({19], pp. 204-220) has discussed the general two-way 
situation, basing his treatment on [21]. Bennett and Franklin [2] have sketched a 
proof for the two-way classification (without replication) (pp. 474-477), and have 
stated the result for the three-way case (p. 394). 

Wilk and Kempthorne [24] have treated the case of the general two-way classi- 
fication with general sampling of the two factors, with replication, and with 
complete randomization of experimental units, but without free errors or inter- 
actions with experimental units. They have also [23] dealt with the three-way 
classification, with general proportional numbers in a cell, general sampling of all 
factors, complete randomization of experimental units, free errors only restricted 
to be of the same variance, but with no interaction with experimental units. 
Wilk [25] has continued the analysis of the three-way classification, treating both 
interaction with experimental units, and, separately, the analysis of cell means 
for general disproportionate cell numbers. Like the earlier work at Ames, this 
work was carried out independently of ours, in ignorance of the existence of 
[20] and [17], and before the appearance of [2]. 

A short proof, using more special] techniques, has been found for the two-way 
classification by Hooke [18], who has been able to obtain variances and co- 
variances as well as average values of mean squares. 

(3) Free random allotment (complete randomization). The case of no interaction 
with experiniental units was treated in [14] (p. 73) and is discussed briefly in 
Section 10 of the present paper. (Other cases appear in [21], [23], [24], and (25].) 

The important case of arbitrary interaction was first discussed by Wilk [16], 
who has treated a more complex case in [25]. 

Next come the relations involving a minimum of three classifications. The 
fact that three classifications are necessarily involved does not always appear in 
the corresponding analysis of variance. Thus in a randomized block experiment, 
treatments, blocks, and plots must all appear, though there is no trace of plots in 
the analysis of variance. The recent development of formulas for average mean 
squares for such relations between classifications has been in the hands of Kemp- 
thorne and Wilk, as the detailed summary will now show. The present authors 
have done no work in this area. 

(4) Randomized blocks. The case of no interaction with experimental units 
(with plots) is classical, and formulas are well known. 

The important case of interaction with experimental units was first treated, 
under certain restrictions, by Neyman (with cooperation of Iwaskiewicz and 
Kolodzieczyk) [12]. Some particular cases were followed up by McCarthy [10]. 
The case of arbitrary interaction (not distinguishing technical errors) was first 
discussed in the book of Kempthorne ([8], Sec. 8.4). This treatment was extended 
to the case where each treatment appears p > 1 times per block by Wilk [16]. 

(5) Randomized fractionation (asin the classical Latin square where rows and 
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columns refer to experimental units). The cases where all interactions are assumed 
zero are classical. 

The case of a somewhat restricted interaction of treatments with experimental 
units was approached by Neyman (with cooperation of Iwaskiewicz and Kolod- 
zieczyk) [12]. We understand that this problem is treated in [23], which we have 
not seen, and in ‘complete generality” in further unpublished work of Wilk and 
Kempthorne. 

(6) Crossed fractionation (as in a simple fractional factorial without blocks). 
Here the results for no interactions are classical. Nothing else seems available. 

(7) Mizéd fractionation (as in a “Latin square’’ with two factors and one family 
of blocks). As for (6). 

The work on this class of relations has been restricted to randomized blocks 
and Latin squares. Thus its importance depends very greatly on the field of ex- 
perimentation considered. 

In much of agriculture, and in many related fields, the thought of not having 
blocks within which to randomize never occurs to the statistician. The errors 
he faces are large, even in small blocks, and variability from experimental unit 
to experimental unit may dominate all other sources of variation. Blocks are all 
important, and confounding with blocks is common. He is almost, but not quite, 
justified in refusing the adjective “experimental” to a situation without blocks 
and in calling it “sampling” instead. But there are areas of inquiry, in parts of 
modern industrial technology, in the study of many measurement processes, 
and in other areas (usually far from biology), where errors are relatively small 
(compared to high-order interactions), blocks can be very large, and complete 
randomization is the order of the day. In these latter areas, treatment of nesting, 
crossing, and random allotment suffices. In the former areas, treatment of ran- 
domized blocks comes first, though it needs to be supplemented with treatments 
of the simpler relations. The problems selected for initial attack by the inde- 
pendent groups working on average values of mean squares reflect their back- 
grounds. 

We should come now to relations involving a minimum of four classifications 
(for example, simple lattices). But no work seems to have been done beyond the 
initial classical results for no interaction. 

The present paper is concerned with the crossed classifications and, in the 
statement of rules, with combinations of crossed, nested, and noninteracting 
completely randomized classifications. In its general presentation and discussion 
it undoubtedly makes use not only of ideas from the references and research 
reports cited, but also of the valuable personal discussions which its authors have 
had with almost all the persons mentioned above, with H. Fairfield Smith, 
Franklin E. Satterthwaite, the late Charles P. Winsor, and others. It would be 
impossible for us now to assign specific credit relating to specific ideas to spe- 
cific persons. We owe a particular debt to Kempthorne and Wilk for illuminating 
discussions. 


5. The two spans of the bridge of inference. In almost any practical situation 
where analytical statistics is applied, the inference from the observations to the 
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real conclusion has two parts, only the first of which is statistical. A genetic 
experiment on Drosophila will usually involve flies of a certain race of a certain 
species. The statistically based conclusions cannot extend beyond this race, yet 
the geneticist will usually, and often wisely, extend the conclusion to (a) the 
whole species, (b) all Drosophila, or (c) a larger group of insects. This wider 
extension may be implicit or explicit, but it is almost always present. If we take 
the simile of the bridge crossing a river by way of an island, there is a statistical! 
span from the near bank to the island, and a subject-matter span from the island 
to the far bank. Both are important. 

By modifying the observation program and the corresponding analysis of 
the data, the island may be moved nearer to or farther from the distant bank, 
and the statistical span may be made stronger or weaker. In doing this it is 
easy to forget the second span, which usually can only be strengthened by im- 
proving the science or art on which it depends. Yet a balanced understanding of, 
and choice among, the statistical possibilities requires constant attention to the 
second span. It may often be worth while to move the island nearer to the distant 
bank, at the cost of weakening the statistical span—particularly when the sub- 
ject-matter span is weak. 

In an experiment where a population of C columns was specified, and a sample 
of c columns was randomly selected, it is clearly possible to make analyses where 

(1) the c columns are regarded as a sample of c out of C, or 

(2) the c columns are regarded as fixed. 
The question about these analyses is not their validity but their wisdom. Both 
analyses will have the same mean, and will estimate the effects of rows iden- 
tically. Both analyses will have the same mean squares, but will estimate the 
accuracy of their estimated effects differently. The analyses will differ in the 
length of their inferences; both will be equally strong statistically. Usually it 
will be best to make analysis (1) where the inference is more general. Only if 
this analysis is entirely unrevealing on one or more points of interest are we 
likely to be wise in making analysis (2), whose limited inferences may be some- 
what revealing. 

But what if it is unreasonable to regard c columns as any sort of a fair sample 
from a population of C columns with C > c. We can (at least formally and nu- 
merically) carry out an analysis with, say, C = «. What is the logical position 
of such an analysis? It would seem to be much as follows: We cannot point to a 
specific population from which the c columns were a random sample, yet the final 
conclusion is certainly not to just these c columns. We are likely to be better 
off to move the island to the far side by introducing an unspecified population 
of columns “like those observed” and making the inference to the mean of this 
population. This will lengthen the statistical span at the price of leaving the loca- 
tion of the far end vague. Unless there is a known, fixed, number of reasonably 
possible columns, this lengthening and blurring is likely to be worth while. 

This discussion follows the line of the classical discussion of “selecting the 
right error term,’’ as developed by Fisher and expounded by many statisticians, 
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with two considerations rarely faced except in careful discussions of groups or 
series of experiments (cf. Chapter 28 of [8] for references): 
(1) We admit that more than a single analysis of given data may have 
“correctness.” 
(2) We have tried to state the uncertainties of the post-factoC = « choice 
a little more specifically than usual. 
In any case, however, this discussion illustrates one way in which the nature of 
the appropriate analyses of variance depends on the purposes of the analysis. 


6. The varied roles of randomization. Emphasis on randomization of arrange- 
ment entered modern statistics with the analysis of variance—in the early work 
of R. A. Fisher. The year 1935, in which Neyman (with cooperation of Iwaskie- 
wicz and Kolodzieczyk) [12] discussed the problem of interaction with experi- 
mental units, was marked by the appearance of The Design of Experiments, in 
which Fisher stressed both the role of randomization as a guarantor of the valid- 
ity of an experiment and the close correspondence, in certain examples, of tests 
of significance based on randomization (assuming no interaction with experi- 
mental units) with those based on an assumption of normality of distribution. 

Two years later, in 1937, the first papers of Pitman’s series on randomization 
appeared. Pitman was seeking tests of significance which would be independent 
of the underlying distribution, naturally tried for randomization tests, and was 
much surprised when the natural approximation to these tests turned out to be 
the classical normal theory tests. This series of papers culminated in his Bio- 
metrika paper [13] which dealt with randomized blocks. In the meantime, Welch 
[15], had applied similar methods to both randomized blocks and Latin squares. 
In all of these papers, the assumption was made that there were no interactions 
with experimental units. As clearly stated, the motivation of these papers was 
to obtain tests which would apply to any distribution of errors, and randomiza- 
tion was used to mediate this independence of distribution. 

The treatment of Pitman and of Welch went far beyond the subject of the 
present paper, the average values of mean squares. They dealt with a function 
of the ratio of mean squares, and obtained a number of moments. For the cases 
they treated, their results go farther than our knowledge for any other situations. 
They treated randomized blocks and Latin squares explicitly. Implicitly their 
results cover any less restrictive randomization; in particular, their results also 
cover complete randomization. Their work was carried out for the case of one 
classification of treatments. Implicitly, it applies to cases of two or more treat- 
ment classifications, but only if these treatment classifications do not interact. 

We have today no comparable basis for the analysis of factorial experiments 
where interactions may be present and where, hence, the main effects will usually 
be compared with interaction. Our knowledge is limited to average values of 
mean squares, except for the work of Hooke [18] on second moments in the two- 
way case. To reach a situation comparable to the case of a single classification 
of treatments will require an extension of Hooke’s work, both to higher moments 
and to more-way classifications. The essential difficulty is the probable existence 
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and possible perversity of interactions. No assumptions about randomization 
will allow us to avoid facing them. For the present, presumably, we shall con- 
tinue to base our inferences on average values of mean squares, and try to com- 
fort ourselves with distant and tenuous analogies with the single treatment 
classification case where, if interactions with experimental units be absent, 
Fisher, Pitman, and Welch have shown us that the situation is rather pleasant. 

If one is dealing with situations where the contributions of the experimental 
units to variability is large or even dominant, it is easy to alter the role of ran- 
domization somewhat. Instead of thinking of it as a mediator which assures the 
validity of significance tests for any shape of error distribution, as Pitman did, 
one can think of it in itself. (If the effects of experimental units dominate all 
other sources of variation, it is not only easy but necessary.) This point of view 
was vigorously taken up by Kempthorne [8], [9], and has strongly motivated the 
work of Wilk and Kempthorne [21], [22], [23], [24], [25], [26]. 

While we hold rather definite views, we do not feel that the issues involved 
have been finally settled. We do feel, however, that a knowledge of what the 
issues are is essential in understanding just how far average values of mean 
squares take us and how our work is related to that of others. 


DESCRIPTION OF SITUATIONS TREATED 


7. A description from the point of view of urn sampling. We consider a finite 
or infinite number of elements (possible measurement results on animals, plots, 
batches, samples, etc.). These elements are classified into R rows (litters, days, 


blocks, pressures, temperatures, times, etc.) and C columns (doses, operators, 
treatments, temperatures, times, catalysts, etc.). Each of the RC pigeonholes 
thus formed contains N elements. In some circumstances it is natural to consider 
the RCN elements as a population, in others it would be unnatural. 

A sample of r rows is taken, each row having probability r/R of being selected. 
Given that a particular row has been selected, the conditional probability that 
any other given row will have been selected is (r — !)/(R — 1). 

Similarly a sample of c columns is taken, each column having probability 
c/C of being selected. Given that a particular column is in the sample, the con- 
ditional probability that any other given column will be in the sample is (ec — 1) 
(C — 1). 

Every pigeonhole located in a sampled row and a sampled column is thus a 
cell included in the sample. The pattern of cells so obtained might well be called 
a bisample, since it is defined by two samples, one of rows and one of columns. 
In each such cell n elements are sampled, each element in the sampled cell having 
probability n/N of being selected. Given that a particular element has been se- 
lected from this cell, the conditional probability that any other element in the 
cell has also been selected is (n — 1)/(N — 1). 

These are the only assumptions we shall need (for the situation with which we 
are concerned). From the point of view of urn sampling, we have only to define 
the variance components corresponding to such an array of RCN elements. 
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Notice that our definitions of variance components apply to fixed models and 
mixed models as well as to random ones. There has been a tendency to refer 
to random models as variance components models, presumably because the 
analysis associated with a fully random model is most likely to be directed 
toward the estimation of variance components. This choice of words has already 
given rise to confusion in connection with mixed models, and its continued use 
will undoubtedly cause other complications. We recommend that it be discarded. 
Since we have to deal with two sets of rows, columns, and cells, one set in the 
original array of RCN elements and one in the array of ren elements we actually 
have at hand, it is unusually important to make a clear distinction in our nota- 
tion. We shall do this by using capital letters for all that has to do with the under- 
lying array and lower case for all that has to do with the observed array. Thus 
Zijx = value of kth element in jth column in the ith row in the observed 
array, 
Xx = value of Kth clement in the Jth column in the /th row in the under- 
lying array. 
(This convention will be altered in Sections 23ff). We shall indicate an unweighted 
mean over observed values by a dot in place of the subscript averaged over, 
and over underlying values by a dash in place of the subscript averaged over. 
Thus 
24;. = mean value of all observed elements in the ith row and jth column, 
Xj; = average value of all underlying elements in the pigeonhole in the Jth 
row and Jth column. 
We can now define the variance components of the underlying array by 


ee eR = st ie 2 
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These formulas reduce to the standard ones in all familiar special cases. 

We observe that the variance components thus defined are exactly the mean 
squares we would obtain if all the values in the entire pigeonhole model were 
subjected to analysis of variance, except for factors of N, NR or NC. 

The description we have just given is both less and more general than the 
model of Section 3. It is less general because we assumed that all populations 
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were of the same size. It is more general because we did not require that the 
sampling be purely at random, but only that any pair of rows, columns, or ele- 
ments was as likely to enter the sampling as any other. We shall now see that 
these differences are nonessential. 

We assumed a constant population size for the purely expository reason that 
we had not introduced the subscripts when we mentioned population size. We 
can now alter three sentences or clauses to read: ‘The J.Jth pigeonhole contains 
N11; elements,” “each element in the sampled cell having probability n/N1, 
of being selected,’ “the conditional probability --- is (n — 1)/(Nyz — 1).” 
After this alteration the urn sampling model is at least as general as our initial 
model (see Section 3). 

In the initial model we assumed that all samplings were ‘“‘at random.” Since 
we are only concerned with averages of quadratic functions of the sampled 
elements, it would be merely an application of a general principle to conclude 
that if the results hold for sampling “at random,” they must also hold for 
sampling “in which individuals and pairs have equal chances of being selected.” 
We may, if we wish, consider that the initial model has been so generalized. The 
initial model and the urn sampling model then became equivalent. 


8. Linear models with “tied” interaction. For convenience in exposition, we 
shall take n = 1! and drop the index k for most of this section. Under these con- 
ditions, the conventional linear model of most texts would appear like this: 

Vig =~ Ot E+ 25 + wi, 

where the & are the row contributions (sometimes called ‘‘effects’’), the n; 
are the culumn contributions, and the w,;; are error or discrepance contributions. 
Various normalizing conditions may or may not be applied. The ¢’s, the 7’s, 
and the w’s will be variously assumed to be fixed or random samples from in- 
finite (or perhaps finite) populations. But the key assumption will be that the 
variation of the w’s is independent of what the ~’s and »’s may be. It is this 
assumption of independence which has made the use of such linear models so 
special and dangerous. 


In particular, this model cannot accommodate the following situation easily 
described in terms of four pigeonholes: 


N(o,1) | N(2,1) 


N(2, 1) | N(0, 1) 


where N(u, 0°) stands for a normally distributed infinite population with average 

value » and variance o*. The independence assumption is an assumption about 

the behavior of the world, and not just about how we do experiments. 
Although some have expressed doubts, there are real advantages to linear 


models. 
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It has therefore been worth while to learn how to generalize linear models to 
apply without assumption. 

Let us go, therefore, to a situation with R potential rows and C potential col- 
umns, where J designates a potential row and J designates a potential column. 
If f(7, J) is an arbitrary function of 7] and J, we may choose to define 


= fi ; ), 


= f(I, —) — fl 


’ 


‘i , a) — f(- 


= fl, J) — fl,—) -f—,J)+fhi 


where replacement of an “J’’ by a ““—” implies averaging over all R potential 
rows and replacement of a ‘‘J”’ by a ‘““—” implies averaging over all C potential 
columns. (Note that other definitions are possible.) 

We shall then have 


fd, J) = 6+ é,+ nm + An, 
where 
A;— = a constant = A_, 


(and where indeed this constant is zero, and > £ = (0) and Yn = 0, although 
we shall not require any of these to vanish when we use this model). 

Now if we pick r rows out of R and c columns out of C, we may designate actual 
rows and columns with 7’s and j’s, and write /(z) for the potential row corre- 
sponding to actual row 7, and J(j) for the potential column corresponding to the 
actual column j. If z;; is the value of f{7(z), J(j)], we have 


45 = O+ & + 5 + Ag;, 


where we have written é; for i), nj; for naj, and Ax; for Anos . We have 
here an additive model for a bisample of rows and columns drawn from an ar- 
bitrary R X C array of constants. How does this differ from the “independent” 
model in which w;; is assumed independent of &; and 7,? How is it to be de- 
scribed ? 

It differs in that the ,; are “tied” to the corresponding £; and 7;. They are 
“tied” in a very specific way, however. It need not be true that \,; is a well- 
defined function of the values of £; and n; , for there may be, for example, vari- 
ous pairs of values of J and J for which the corresponding £, and 7, have the 
same values but the corresponding \,, are not equal. Thus \,; is tied to & and 
n; through the values of /(¢) and J(j) rather than through the values of &, 
and 7;. 

But what if the pigeonholes contain populations, and we sample one element 
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from each actual cell. Can this be treated similarly? Quite easily. Let 


fU, J) = Xys- _ yD Xue, 
Nig K 


@ryx = Uisx — fU, J). 


Then, if the sampling in each cell is at random and independent, w,; is a random 
sample of 1 from a population Dj,» ,.:;, whose average value is zero, and 


rig = Xr = (I, J) + wi; 
6+ & + hi + Xe + 55, 


where, although A,; and w;; have the same indices, they represent quite different 
sorts of quantities. The \’s are interactions tied to the ¢’s and the 7’s, while the 
w’s are independent fluctuations. 


9. Main effects and main contributions. All those who use the analysis of 
variance are familiar with the words “main effect,” but far fewer have any really 
clear understanding of what they mean. Yet all specific analysis of variances 
procedures imply very specific interpretations of what it is that concerns us. 
Any practically satisfactory structure for the analysis of variance must bring 
the essential definitions into the limelight. 

Because there is likelihood of confusing the quantity in the model and its 
estimate derived from the observations, we call the quantity in the model a 
main contribution. This is its name, but what is it? Usually only a relative defi- 
nition makes sense. We talk of main contributions, but we work with their dif- 
ferences. What then does the difference §; — & mean in a situation modeled by 


Big =~ OF E + 05 + day + wi; 

with the two conditions 
A;— = a constant 
ave {wy} = 
Let, again, 
ff, J) = average response in the IJ pigeonhole; 

then 

O+t- ++ dn = fil, J), 


and we easily find that 
br - be = SU, —) - (B= FDU - 1K, J). 


This states that the difference in main contributions between the 7th and Kth 
rows is the average over all C potential columns of the average effect of changing 
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from the Jth to the Kth row. The average is over all potential columns, and for 
the present definition is equally weighted. What is most important in defining 
row main contributions is our choice of what are the potential columns. 

This is the heart of the problem of main contributions—it is our attitude to- 
ward the other factors which affects the definition of a main contribution. A 
change from considering rows as fixed to considering rows as a sample from a large 
population need have no effect at all on the definition of row main contributions, 
but it will substantially alter the definitions of the column main contributions. 
It is only when we have relatively explicit definition that we can force ourselves 
to recognize this fact. 

We stated that an alteration in the set of potential rows need not alter the defi- 
nition of the row contributions. This is so because we have nol required that 
> & = 0. We have not made this requirement because it serves no useful pur- 
pose. (Its imposition would make all the £; estimable from an experiment, but 
there are two reasons why this would not be useful: (i) Because we are interested 
in estimating only the differences §; — £; anyway. (ii) Because, usually, only 
some of the £; appear as ; = x , and since the others can surely not be esti- 
mated, what do we care about one more unestimable parameter? By making 
one more parameter unestimable, we have gained inestimable freedom.) 

It is in presenting explicit quantities which represent main contributions 
that this linear model—the generalized, nonindependent linear model—has a 
significant and specific role. It contains exactly the same definition of main 
contributions, but how many readers recognized this as they read this section? 
It is concealed in such urn formulas as 


2 i < \3 
r= > a ae A 

we Re . 
which implicitly state that what we shall ask the “row” line of the analysis of 
variance to inform us about, are the X,__. The difference X¥;._ — Xx_~_ is 


identical with —; — &, but the linear model brings the situation out with 
greater force and clarity. 


10. Description by linear models. We now set out the general linear model 
in the form 


LX ijk = 0 — E; + 3 T Vij + Wijk , 


where 1 SiS5c¢,1 Sj 27,12 k £7 and the assumptions are as below, 
where in dealing with simple finite populations, we shall always use the modern 
definition of variance, dividing by one less than the number of elements involved. 


Thus, for example, the variance of the population of potential columns is 


1 c 
a; -= 9 > (ny “= n-)’, 
=— 3 ft 
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where n_ is the average of 7, over the population of potential columns 


1 
1 = GM. 


In general, we shall use dashes for averages of population quantities, and dots 
for means of sampled quantities. 

With these preliminaries we can state the assumptions about 6, £;, and the 
nj Which we wish to apply: 

(1) There is a population of general contributions, of variance o3, and @ 
is a random sample of 1 from this population. 

(2) There is a population of row contributions of size R and variance o} 
and & , &,--:, & are a random sample of size r from this population. 

(3) There is a population of column contributions of size C and variance 
im and m, 2, °°* , Me are a random sample of size c from this popula- 
tion. 

(4) There is a two-way array of interactions \;, , one for each row contribu- 
tion and column contribution in the corresponding populations, the 
averages (over J) \;— are independent of J, the averages (over J) A_, 
are independent of J, and the interaction actually occurring in the ex- 
pression for 2; is Az(»,7G) ; that is to say, it is the interaction which cor- 
responds to the column contribution and row contribution which occur 
in the expression for 2;; . 

(5) The sampling in (1), (2), and (3) takes place independently. 

We define, for reference, 


es 1 A i 
Cre = (R — 1) — 1) aol (Ary r_-) 


1 2 
= ®—i1ce— 1) ) eh (Ary — Ar- — ALy — ALL)’. 


(Note that we do not use (RC —1) as a divisor.) 
There remain the assumptions about the w,;j , where we still have much choice, 
so long as we require that 


ave {wx} = a constant independent of J and J. 


The assumption which most exactly corresponds to the pigeonhole model is the 
following: 
(6’) For each of the RC combinations of a population row with a population 
column there is a population of size N;,, average value wu (the same 
for all JJ), and variance o7,. The «;; are a sample of n from the J(#) 
J(j)th population. Sampling is at random in each cell and independent, 
both between cells and of the samplings in (1), (2), and (3). 
With this choice, the generalized linear model corresponds exactly to the pigeon- 
hole model, provided we place o3 = 0 and thus keep the general contribution 
constant. 
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ry ° ene e 2 ° 
The variance component for repetition (or “‘error’’) oz , and the effective popu- 
lation size N, are to be found from 


b= Lhe 
cg, * == Ory. 
I J 


RC 


n 2 1 nr 2 
Jae bEE(-2)a. 
( n/"* Ro > z( W:,)°" 
We remarked that (6’) was not the only choice for a linear model. There are 
many. We shall give detailed consideration to only one other, namely: 
(6”) The cell contributions are 
, ” 
Wijk = Wijk + Wij, 
where the w; ;, satisfy the conditions of (6’) while the w;; are the result 
of independently randomizing a set of rc values of variance o” among the 
re actual cells. 
We shall now need to set 


(1 _ t)e =o + a 2X (1 - ye) 
(the last of which may lead to negative N’s). 

Clearly (6”) is more complicated than (6’), which matched the pigeonhole 
model. Why then do we wish to consider this added complexity? Because ex- 
periments are often more complicated than the simple pigeonhole model makes 
allowance for. For example, let us suppose that we wish to study a chemical 
reaction at all combinations of 6 specific pressures and 7 specific temperatures. 
The pigeonhole model would naturally have 6 rows and 7 columns, and in each 
pigeonhole we would put an infinite population. The pattern of the averages of 
these populations would represent the response of this process to pressure and 
temperature. The variations within each of these populations would represent 
fluctuations in process behavior and measurement. But we would be ill advised 
to stop here. Every well-trained statistician would insist, especially if every 
combination of pressure were to be tried once only, that the order in which the 
experiments were performed should be randomized. He would do this in fear 
of systematic errors somehow associated with time. In other words, in fear that 
the w;; of (6”) were not all zero. We need very great flexibility in our models to 
deal with real situations such as this. (Notice that we do not attempt to discuss 
the very real and important cases where randomized blocks, and related uses 
of randomization are involved. We are here concerned with the flexibility re- 
quired for the simple case of two crossed classifications, where experimental 
units are not so important as to make complete randomization inadequate.) 
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The flexibility of the linear model is not limited to (6”). It can easily acecommo- 

date any number of terms of each or all of the following kinds: 

(a) Individual distributions of w7;< for each IJ. 

(b) Randomization of re values w” over the re cells. 

(c) Randomization of a sample of re values w’” drawn randomly from a 

population of size M > re over the rc cells. 
(d) Randomization of RC values w’” over the RC pigeonholes. 
(e) Randomization of RC values w’’” drawn randomly from a population 
of size > RC over the RC pigeonholes. 
The ease with which such complete randomized contributions can be added to 
linear models depends on a simple remark, which was apparently first made by 
McCarthy ({10], p. 358) in the case where all is normal and the variances are 
constant and was later exploited ([14], p. 107) in the general case; namely, that 
equal, completely symmetric correlations do not affect the average mean squares, 
except for the mean square for the general mean (to be discussed in Section 14). 
That this is true can be easily seen by the following argument. If the correlations 
are equal and negative, add a varying general contribution with variance equal 
to the correlations. This will exactly annul the correlations. If the correlations 
are equal and positive, observe that they are exactly the correlations which 
would result from such a fluctuating main effect. In either case, the desired result 
follows immediately. 
It may not be immediately appreciated why we need more than one term of a 

given kind. However, if, in the chemical reaction experiment just considered, we 
plan to use a separate piece of equipment and a separate operator for each run, 


we will wish to randomize all three variables—epoch, equipment, and operator. 
Our model will require three terms of the w” kind. There will be three different 
sorts of experimental units! 

If we put down all the variables considered in a really complex experiment, 
even the linear model begins to look complicated. 


11. The development of the linear model. The classica] linear model, as 
described at the beginning of the previous section, had the following disadvan- 
tages from our present point of view. 

(1) It made assumptions about the way in which the response depended on 
the two (or more) factors. 

(2) It assumed that at most one term should appear for a given set of in- 
dices. 

(3) It assumed special normalizations (like A, = 0) rather than more 
general ones (like \,_ independent of J). 

(4) It assumed constant variance “within pigeonholes.”’ 

(These have been listed in general order of importance.) 

Models which avoided the first disadvantage were introduced (for randomized 
blocks and Latin squares) by Neyman (with the cooperation of Iwaskiewicz 
and Kolodzieczyk) [12] and were used by McCarthy [10], but, perhaps because 
of the generally negative flavor of the papers, seem to have lain forgotten for 
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many years. The results (for crossed classifications) of Memorandum Report 
18 [21] were expounded in the terms used in the last section. Similarly models 
were independently and extensively used (for various situations) in the book of 
Kempthorne [8] and have been prominent in the recent work of Wilk and 
Kempthorne. For all of this, the relationship of \;; to ; and £; when it is tied 
through the values of /(z) and J(j) is not yet as widely familiar among statis- 
ticians, as it seems to us that it should be. 

The introduction of two or more terms with the same set of subscripts was 
also due to Neyman (with the co-operation of Iwaskiewicz and Kolodzieczyk) 
[12] who introduced first a single term and then separated it into two parts, 
corresponding to “plot error” and “technical error.” In [14], the approach 
was to start from the parts and then combine, rather than the reverse. 

Both this difference, and the introduction of the weaker normalization in [14], 
were related to two desires: a desire to weaken assumptions wherever possible, 
and a desire to treat contributions more as things with independent existence 
rather than as differences between certain averages. The general philosophy 
ran along the lines that “‘if the effect is real and substantial, it should appear in 
the model whether or not it can be estimated from the data.” 

The assumption of constant variance “within pigeonholes’’ is a natural as- 
sumption, and is important in connection with both higher moments of mean 
squares and with the variances of contrasts and other linear combinations. As 
pointed out in [14], however, it may be dropped without any effect on average 
mean squares. 

A more recent development in the case of linear models has been the introduc- 


tion of a pair of closely related linear models, called ‘“‘the population model’’ 
and “the statisticai model’”’ by Wilk [22] and their further use by Wilk and Kemp- 
thorne [23], [24]. By introducing this distinction, the assumptions about the 
response to the various classifications can be presented first, while the assump- 
tion about the randomization involved in setting up the experiment can be 
added later. This development formalizes and makes definite distinctions hinted 
at in Kempthorne’s book [8]. 


12. The next consideration. ‘he next step in generality, when we have com- 
pletely randomized some contribution, to which we will attach the name “‘ex- 
perimental units,” is to consider the possibility that these units can interact 
with the other classifications. (Since the results are a function of both classifica- 
tions jointly, the behavior resembles that of crossed classifications; yet, in any 
particular experiment, one classification is nested in the other. For these reasons, 
the term “cross nested” has been used informally to describe the situation.) 

It is easiest to make the situation clear when a one-way classification, say 
temperature, is involved, and epochs are randomized. If we admit that the effect 
of an epoch can be different for different temperatures, then we are led to a two- 
way set of pigeonholes—pigeonholes labelled by temperature and epoch. If 
each temperature is used once only, the actual experiment (if fully randomized) 
could be described as picking a set of pigeonholes, one in each row and column, 
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and picking a value from each of these selected pigeonholes. It is as if we had the 
results of that part of a Latin square occupied by a single letter. Even though we 
cannot estimate the temperature by epoch interaction variance component, 
it is clear that it will enter into the average values of the mean squares which we 
do obtain. 

Complete randomization is a binary relation between classifications, but, 
especially when without replication, it is closely related to the ternary relation 
of the Latin square. Again independent work has led to related results. Kemp- 
thorne treated the Latin square with arbitrary interactions between rows and 
columns, but with no interactions with treatments in his book ({8], pp. 190-191). 
In particular he obtained the variance of a treatment mean. This last particular 
result is also the variance of the general mean in the unreplicated case of com- 
plete randomization with arbitrary interaction (with experimental units) which 
was obtained independently by Cornfield and Evans, and reported, with a 
modified proof, in Hansen, Hurwitz, and Madow ([6], pp. 262-265). So far as 
complete randomization is concerned, more general results were obtained by 
Wilk [16, 25). 

It is clear, however, that there are a number of stages of sophistication, care, 
or cynicism (as you please) about our treatment of such a factor as epoch. Some 
of these are the following: 

(1) In both design and analysis we neglect it altogether. 
(2) We neglect it in design, and in analysis we use the additive model to 
show ourselves that we should have randomized it after all. 
(3) We randomize it in design, and in analysis we use the additive model 
to show ourselves how well off we are. 
(4) We randomize it in design, and in analysis we use the arbitrary-interac- 
tion model to show ourselves that the situation is not quite perfect. 
(5) We take it into the design of the experiment as a full-fledged factor. 
Each of these attitudes is appropriate in its place. In every experiment there 
are many variables which could enter, and one of the great skills of the experi- 
menter lies in leaving out only inessential ones. What we have just observed 
is that there are gradations between variables which are entirely out and those 
which are entirely in. 

It is generally understood that in the design of an experiment, there are three 

classes of variables: 
(i) those treated as factors, 
(ii) those randomized, and 
(iii) those neglected. 
The additive model helps us to think about the choice between neglect and ran- 
domization. The arbitrary-interaction model helps us to think about the choice 
between randomization and recognition as a factor. 


RESULTS 


13. Results for the two-way classification. The average mean squares for the 
pigeonhole model are set forth for the general case and for three special cases 
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generalizing the usual fixed, random, and mixed models in Table 1. The impor- 
tant thing to notice in this table is the appearance of the factors 


c 
l-—<-, 


which serve to suppress certain terms completely in the fixed situation and to 
reduce their effects on the average values of the mean squares when the popula- 
tions are finite but larger than the samples. 


TABLE 1 
General results and special cases 


Average value of mean squares 


Line in the | Special cases 
analysis of | | ‘ , iceadlleel tetas 
variance 
(Random) ¢, r finite (Mixed) ¢ = C, ¢ finite 
N,C, R, infinite N, R infinite 


N infinite 


| 2 2 2 2 2 2 

|\oe + NNCon| og + NR, org + nNCop 
2 

+ nncop 


(1 - *) oe + No ge + Nree 





+ (1 —_ =) Nee 


+ nro> 


2 


} n 
; 2 2 2 2 2 2 2 2 
Interaction | (1 = «) + Nore | og + Nog | oe + NOR, og + NOR, 


2 2 2 
Error Cr oz | Og ox 


Notice in particular how, when R = C = N = infinity, o% follows oxc 
everywhere, just as the lamb followed Mary. This is the usual result for a ran- 
dom model, and appears here in a situation of far greater generality. 

The disappearance of ogc from the average values of row and column mean 
squares in the fixed case is also familiar. 

In the mixed model the average mean square for rows does not involve the 
interaction variance component (because all columns were observed!), while 
the average mean square for column involves it with unit weight (because only 
a small sample of rows were observed!). In each case the behavior of the other 
variable (or, in general, the other variables) determines whether the interaction 
appears or not. 
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That the result for the mixed model is a consequence solely of the urn sampling 
approach, and not of the special definitions of interaction and row components 
of variance used, may be seen from the following simple example. Consider a 
population composed of three rows and two columns, with one element per 
cell, having the following numerical values: 


0 100 
0 
50 50 


Each of the three possible samples of two rows and two columns will yield a 
row mean square of zero and an interaction mean square greater than zero. 
Hence for this population the interaction mean square exceeds the row mean 
square for each sample. Testing the row mean square against interaction would 
be obviously incorrect. 

These results are very general. We assumed only a symmetrical placing of RC 
sets of numbers in RC pigeonholes and a well-defined method of extracting the 
ren numbers entering the actual analysis of variance. Otherwise, the numbers 
can be as arbitrary as we please. 


14. The mean square for the general mean. While the central facts have 
been pointed out from time to time (e.g., Cochran [3], Hendricks [7]), and while 
a number of mathematical statisticians have made use of the fact in their lec- 
tures, it does not seem to be widely recognized that it is almost always feasible 
to complete the analysis of variance table by adding a line for the general mean, 
and that when this has been done, both classical analysis of variance and sampling 
from finite populations become special cases of a unified situation. We feel that 
wider recognition and use of this fact would be valuable. 

In the present situation we have only to introduce a reference origin MW (which 
is freely at our disposal) and define both the sum of squares for the general mean 
and the mean square for the general mean (there is but one degree of freedom) as 


ren (x... — M)’ 


(This is what is frequently known as the “correction term,” and is appropriate 
to the use of M as a computing origin.) It is easy to show that we have the 
following average value of the mean square for the mean 


. : is : 1— ‘ 
ronoi + en (1 - Fy oi + m(1 - fer +n(1 - £)( R "oi 


+ (1 ” n) ox + ren(ave {x...} — M)’, 


which like all average mean squares in a balanced analysis of variance decompose 


into (F)|(@) + (H)}| where 
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number of units involved in each corresponding mean, (ren in the 
example), 


variance (so far as differences are concerned) of the corresponding 
mean, 


tate -8)4 +8 -2))-3) 
a+(t+p)a+(2 +) + (2 +) (3 R) 


+ . (2 - ; 1) 03 ox in the example, 
(H) = variance of corresponding contributions (usually measured among 
themselves, but in this case necessarily measured from the arbi- 


trary origin, M, since there is only one general mean and hence = 
(ave {z...} — M)’). 


15. Results for the three-way classification. The pigeonhole model for the 
three-way classification is the natural generalization of the two-way pigeonhole 
model. There are RCS pigeonholes, one in each combination of R rows, C col- 
umns, and S slices. In each pigeonhole there is a population. Random samples of 
r rows from R, c columns from C, and s slices from S are independently drawn. 
The intersections of the drawn rows, columns, and slices determine rcs cells. 
In each of these rcs cells n elements are drawn at random. All drawings, whether 
of rows, columns, slices, or elements are independent. 

The variance components are defined in a way similar to that for the two-way. 
They differ from the (hypothetical) mean squares we would get, if we analyzed 
the entire model, by simple factors such as r, c, s, etc. 

The average values of the mean squares are given for the general case in Table 
2. Here, with obvious extensions of notation, 


os = 2G X X Loi ie 


(1 "Nv > (1 “?_ ) cise, 
d JK 


2 ° e ° 
og = variance of the general contribution. 


and 


(Except for the top line, these results will be found in Bennett and Franklin 
((2], p. 394), while generalizations have been given by Kempthorne and Wilk 
[23] and by Wilk [25}). 

While the results for the three-way are not too complex, and are obviously 
systematic and orderly, they do take up considerable space. If we are to set out 
the corresponding results for factorial arrangements with more factors, there gvill 
be a premium on more compactness. 


16. Unreplicated factorials in general. We can obtain this compactness by 
setting up some rules which will give the coefficient of any variance component 
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TABLE 2 
Average values of mean squares in the general (replicated) three-way classification 


Item DF Average value of mean square 


General mean | 1 n\. r c 8\, 
I-ypretmi--p 1-a i= TRrRCS 


f- 1 = =, 2 +} 1 = 
PP c sere R 


8\- r 
(1 S)ehs + me(1- 5) (1 - 
& 2 r 
+ ner I-3 eh +nce(1-2)e 


c\ 3: 2 
+ nrs Ima oc + ners og 


+ ners(ave {x} — M) 


Rows (R) | 1(,-% Bm, 
—yjetn 3 )%mes 


“ti (1- mt oor + NCS rp 


Columns (C) 


Slices (S) | c 
= C TRrcs8 


c\; r 
+nr i-s est ne 1— R ong + neros 
(r — 1)(e — 1) 
«ence + noehc 


(r — 1)(s —[1) 
wa ores . NCSRs 


(ce — 1)(s — 1) 
s , recs + nets 


RCS (r — 1)(e — 1)(s — 1) 








Repetition | res(n — 1) 


in the average value of any mean square 
We begin with the case without replication and the almost obvious 
Rule 1. Unless the indices of the variance component include all those of 


the mean square, the coefficient vanishes. 
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As a result of this rule, we can confine our attention to cases where the indices fal 
into three categories: 
(1) those appearing in both mean square and variance component, 
(2) those appearing in variance component but not in mean square, 
(3) those appearing in neither. 
In these terms we can now state 
Rule 2. Any coefficient which does not vanish because of Rule | is the 
product of a small letter for each index which fails to appear in the variance 
component and of a factor 


_ (small letter) 
(capital letter) 


for each index in the variance component which does not appear in the mean 
square. 

Let us discuss some examples. Consider the coefficient of o%s in the mean square 
for columns in an unreplicated three-way. Here the behavior of the indices is as 
follows: 

(1) C appears in both, 
(2) S appears only in the variance component, 
(3) R appears in neither. 
r (1 = 5) 
s 


The coefficient is, therefore, 
and, when we recall that n = 1, we see that this checks the entry in Table 2. 
Consider a seven-way classification with indices R, C, D, E, F, G, S and the 
average value of the mean square for the RDG-interaction. What are the coef- 
ficients of oxcper and orcprcs ? SinceG is not an index of oxcpsr , its coefficient is 
zero by rule 1. For the second variance component, the indices behave as follows: 
(1) R, D, and G appear in both, 
(2) C, E and S appear only in the variance component, 
(3) F appears in neither. 
By rule 2, the coefficient is 


The rules are relatively easy to apply. 
We can summarize the action of both rules in a 2 X 2 table to be applied once 
for each subscript. This is done for a general subscript Q in Table 3. 


17. More general designs. Two remarks will conspire to lead us to more gen- 
eral expressions of the rules. First, we note a remarkable similarity between 
these coefficients, and those which arise when one classification is nested within 
another. Second, we notice that there is a useful sense in which a replicated fac- 
torial is not purely factorial in structure—in the pigeonhole model we allowed a 
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TABLE 3 


2 X 2 table giving the factor in the coefficient due to any one subscript, and thus 
expressing both rules for factorials 


Subscripts of variance component 
Subscripts of mean square 


Q present | Q absent 


Q present 0 


Q absent q 


population to nest in each pigeonhole. We are impelled to seek some general 
rules which apply when classifications are crossed (as in a factorial) or nested 
in any way. (We shall find that noninteracting completely randomized contribu- 
tions are more or less automatically included.) 

These relations are not the only important relations between categories. We 
have mentioned randomized blocks above. There are also, for example, the rela- 
tions involved in balanced and unbalanced incomplete blocks, to which our rules 
will not apply. But crossing and nesting are the simplest, and a treatment for 
any combination of them will be well worthwhile. 

The scope of such a treatment will be broader than just the arrangements so 
far mentioned. It is quite possible, for example, to have a one-way array of smal] 
pigeonholes nested in each of a two-way array of large pigeonholes and a popula- 
tion nested in each of the small pigeonholes. This will occur in the example of a 
chemical reaction where temperature and pressure define the large pigeonholes, 
for instance, if we repeat the reaction several times for each pressure and tem- 
perature combination and analyze several portions of each result. Here “sampling 
and analysis” requires a population nested in ‘“‘batch,” while a one-way array of 
“batches” is nested in every “‘temperature-pressure” pigeonhole. 

The general rules are based on a system of indices such that a mean square or 
variance component referring to something of smaller scope must have all the 
indices of the quantity of larger scope, and one or more in addition. A possible 
set of indices for the modified chemical reaction example just described might be 


T = temperature, 
P = pressure, 
PT = their interactions, 
BPT = batches (within PT combinations). 
EBPT = sampling and analysis (within batches). 


In this example it makes no sense to define a B main contribution across tempera- 
ture or pressure, and hence we should not in this notation try to define a B mean 
square or a B variance component. Indeed we should also not try to use BT- or 
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BP- quantities of any sort. In order to inquire, for example, how many batches 
to use to obtain a given accuracy, we should be concerned with “batches within a 
temperature and pressure combination” and in this notation this is a BPT ef- 
fect. (It would perhaps be clearer to use B(PT) or B C PT, but we shall not do 
this here.) 

Any set of indices has certain indices (there may be none) toward which it 
behaves like an interaction. This means that if we sum the corresponding contri- 
butions over such an index (covering the pigeonhole model) the total (and hence 
the mean) is a constant. In the example just discussed 


Set of Interactionlike 
indices indices 


T 
r 

PT 
BPT 
EBPT 


The appearance of B and £ as interactionlike indices merely means that the 
average contributions for “batches” and “‘sampling and analysis” are defined to 
be zero. The absence of 7’ in the last two cases reflects the fact that summing over 
one batch at each temperature, or summing over one sample and analysis at 
each temperature, will not ensure the disappearance of the mean batch contribu- 
tion or the mean sampling and analysis contribution. 

We continue to use capital letters for the dimensions of the pigeonhole model, 
and small letters for the corresponding dimensions of the experiment. We can 
now state a single unified rule (covering crossed and nested relations, and classifi- 
cations completely randomized upon them) as follows: 

Rule: Divide the indices into five groups as follows: 

(1) those which appear in the mean square but not in the variance compo- 

nent; 

(2) those which appear in the variance component and not in the mean 

square and are interactionlike for the set of indices defining the variance 

component; 

(3) those which appear in the variance component but not in the mean 

square, and are not interactionlike; 

(4) those which appear in both; 

(5) those which appear in neither. 
The coefficient with which the variance component appears in the average value 
of the mean square is the product of a factor for each index, the factors being as 
follows: 

(a) for group (1) each factor is zero, 

(b) for group (2) each factor is 


(1 __ (small letter) ) 
(capital letter) / ’ 
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(c) for groups (3) and (4) each factor is unity, 
(d) for group (5) each factor is the corresponding small letter. 
This rule applies so long as all categories are related by crossing or nesting. 
In this form, this rule includes the rules of the previous section, and can also 
be expressed in a simple table, as in Table 4. 


TABLE 4 


Factor due to any subscript, Q, in the coefficient of any variance component in any 
average mean square for any combination of crossing and nesting 


Subscripts of variance component 


reer rt amerecee eee eae etme mee 
present and Q present, not 
interactionlike interactionlike Q absent 


Subscripts of mean square 
| 


Q present Le 1 1 0 


Q absent..... 1- 1 


gino | 
Q | ? 
18. Reasons for the general rules. As indicated by the discussion of Section 
23, below, the basic reasons for the general rules are two: 
(1) Mean squares are ordinarily expressed on a per-element basis, so the 
number of elements associated with specific values of the indices of a mean 
square arises as a factor. 
(2) Where an index is interactionlike we are dealing with variances of 
sample means, so that factors of 


1 ‘ 1 
(small letter) ee (capital letter) 


arise from the formula for the variance of a mean from a finite population. 
These reasons are equally compelling for crossing and nesting. 


19. Non-interacting, completely randomized terms in linear models. We 
discussed briefly, at the end of Section 10, some five kinds of added terms which 
might be added to the linear model for the two-way classification. We summarize 
in Table 5 the coefficients with which the corresponding variances or mean 
variances will appear. These results are easily obtained by the usual way of 
dealing with independent linear models. Observe that if we square the model, 
the average value of all cross-terms will cancel out (except in the mean square for 
the general mean) and hence that average mean squares can be evaluated sepa- 
rately for the original contributions and those of these kinds. 

It was emphasized (at the end of Section 10) that the easy addition of such 
terms was a consequence of a general principle. Several such terms can be added. 
If the modified chemical reaction example discussed in the last section involved 
two raw materials, we might have completely randomized packages of one over 
batches, and have made up one solution of the other for each pressure-tempera- 
ture combination, randomizing the allotment of solutions. If, in addition, some 
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TABLE 5 


] 
Average mean of square value | 
for rows, columns, interaction 





aioe 


1 
rc 
(c) | M 
rc 
() RC 
re 
M 


(e) 


other relevant item were completely randomized over individual samples, we 
should have a model made up of at least four parts: 

(1) dependence of yield on pressure and temperature combined with routine 

fluctuations and errors, 

(2) effects of first raw material, 

(3) effects of second raw material, 

(4) effects of other relevant factors. 
If we can properly assume an absence of interaction between these four parts, 
then we will obtain four sets of (partial) variance components as follows 

(1) appears in EBPT, BPT, PT, P and T, 

(2) appears in BPT, PT, P and T, 

(3) appears in PT, P and T, 

(4) appears in EBPT, BPT, PT, P and T. 
Because of the no-interaction-across-parts assumption, the four sets will behave 
entirely independently, and the rules of the last section will apply to each sepa- 
rately. 

It will only be necessary to remember that (so long as we are not concerned 
with the general mean) no index is interactionlike for contributions which are 
completely randomized. This same principle will apply to other examples where 
absence of interaction between completely randomized parts is appropriately 
assumed. 


PROOFS 


20. The nature of the various proofs. A number of apparently quite different 
ways have been developed to carry out the proofs of the formulas for average 
values of mean squares (see Section 4 for references). They fall quite neatly 
into two categories: 

(1) Proofs using special machinery or indirect methods (e.g., symmetry 
arguments and equating of coefficients for special assumptions as in 
[18] and [20}). 
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(2) Proofs using relatively straightforward algebra (e.g., as in [17], [16], 
and [21}). 
While we feel that the first sort of proof offers the real hope for dealing with the 
more difficult problems which lie ahead, we recognize the usefulness of having 
examples of the straightforward proofs on record, and have endeavored to keep 
our proofs direct and to hold the use of special techniques to a minimum. 

In the two-way pigeonhole model three samplings take place independently, a 
sampling of rows, a sampling of columns, and a sampling within cells. The sepa- 
rateness and independence of these samplings is very important in reaching a 
moderately simple direct proof, as is the possibility of considering the separate 
samplings as occurring in any order, and then using quantities defined by some 
array intermediate between the underlying array and the observed array. It is 
only by combining such a choice with a well-chosen notation and order of pro- 
cedure that we can keep the algebra from becoming quite heavy. 

The original proof [20], was carried through explicitly for the two-way case 
without replication in the cell (n = 1, N = ©), made explicit use of linear 
models, and depended on two comparisons of three situations: 

(1) rows fixed, columns fixed, within cells sampled, 
(2) rows fixed, columns sampled, within cells sampled, 
(3) rows sampled, columns sampled, within cells sampled. 

The next proof [17], found without knowledge of the first result, made explicit 
use of urn sampling and was carried through explicitly for the two-way case with 
o;, = constant, and rested most conveniently on the intermediate situation 
where the within-cell sampling had been completed in each of the RC cells, but 
the r rows and c columns of the observed array had not yet been been fixed. (Note 
that this requires thinking about, and calculating with, the cell means for the 
RC — rc cells which will not be observed.) The material in Section 18 is modelled 
after this proof. 

To obtain the average value of the error mean square it is most convenient to 
think of the sampling as occurring in exactly the reverse order. Here it is most 
convenient to rest on the intermediate situation where the r rows and c columns 
have been fixed, but the n individuals to be selected from H in each of these rc 
cells are still unspecified. 


21. The error mean square. The error mean square can be written in various 
forms, in particular as 


2 1 1 2 
EEE ew - a= 2D E[ AE ew - ou. 


rc( aan 4EX 


So long as we think of the r rows and c columns as fixed, we have the mean of rc 
terms of the form 


Pee X (“isn = ij.) i. 


n 
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and the average value of this term is well known from the theory of finite sampling 
2 r = — 
to be ors; . Thus the average value of the error mean square is 
2 2 

— Ory = OR. 

RC 11 3=1 
Even though this is clear, we shall give a formal proof as an introduction to the 
use of a significant technique. 


22. Indicator variables. The direct evaluation of the average values of mean 
squares is greatly facilitated by the use of a simple device [4]. We introduce a 
set of indicator variables a; , a2, --- , @g, one for each row in the underlying 
array. The value of these variables depends on the particular sample of rows 
which has been selected for the actual array, and is given by: 


_ J1, if the th row is in the sample of rows, 
, 0, otherwise. 


From our assumptions about the sampling, 
ave {a;} = ave 


ih tas aS. See. es 
a hh eee ea) 
where we write ‘“ave’’ for the average over all possible samples (we could have 
written “E”’ for expectation, but we preferred the more perspicuous notation). 
Similarly, we introduce indicator variables 6; , by , --- , bc for columns, by 


ard 1, if the Jth column is in the sample of columns, 
r 0, otherwise. 


As an illustration of the use of these variables, consider the averaging over all 
samplings of rows and columns of 


1 1 

= Zz > Ori). 3U4) = = > i arb, or 

To «@ j we 2 

with which we closed the last section. The average value is to be found by replac- 
ing a; by r/R and b,; by c/C, and we have the result announced above. 


23. Fundamentals of bisampling. We now consider (i) an arbitrary population 
of R-by-C arrays {y;,;:J = 1,2,---,R; J = 1,2, --- , C}; (ii) the operation 
of randomly sampling r rows and c columns; (iii) the resulting r-by-c arrays; and 
(iv) the row, column, and grand means for these r-by-c arrays. It is our purpose to 
calculate averages of certain symmetric quadratic expressions in the y;,, both 
certain symmetric combinations of variances and covariances of these elements 
and means, and the averages of certain differences of squares. In the next section 
knowledge of these combinations will immediately lead us to the formulas for 
average values of mean squares in a (replicated) two-way classification. 
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We write, and thus change our convention about capital letters, 


Yr, = ave {yrs}, 
pte of : 
Y,==5D¥1:, 
us 
1 
Yr = Do by yrs 5 
Cc oJ 
‘ 1 : 
Zs 7 > ries 
I 
1 
zy - Do ary, 
yr. 2 


ing, _1 
Gru = REY 


I 


> ee - >) aryr, 
Cc J I 


and note that 
ave {y,;} = Yr, ave {z,} = Z,, ave {y} = 


We shall find these notations convenient, although we could have written Y,- 
for Y,, Y_, for Z,; and Y_~ for Y. If we had done this, there would have been no 
convenient parallel notation for the quantities denoted by smal] letters, since y; 
is defined, and will be used, whether or not row J appears in a particular sample 
of rows, and hence in a particular r-by-c array. 

We shal! want our results expressed simply. They will have to involve both the 
average values, Y,, , and the variances and covariances, of the 71; . By intuition, 
or by working out the answer, we can see that they will involve only a few rather 
symmetric combinations of these moments, namely the three variance compo- 
nents corresponding to the Y,, : 

2 


a — ae —e »y 
cow = yh i — RY = sede 


oa = lh 4 - CY = 7 j > (¥-1 — Y--Y, 


de = E—joTD SY Yi, -CL Y¥i-RLZ; + CRY’ 
1)(C — 7 7 


1 2 
—RoDeoD YY Wu - Yi - Ya + isalke 


and certain mean variances and covariances, namely, 


Aa = no b var {yrs}, 
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2 2 2 cov (Yrs » Yee}, 
Zeke 2X cov { (yrs > Yas}, 


~ Te = ic HA 


Be er ee 2d COV (yrs, Yar}. 


R(R Hee — 1) "Tsk “Jet 


we Ee pera i 


We begin by writing y:yx out as 
1 9 
Y¥Yc = 3 [> byYyrsyas + EDbsbeyever | 
c J JAL 
and using the independence of the b’s and the y’s in 


(c—1 ; 
ave {bybiyrs yer} = wot [Yrs Yer + cov {yrs Yer] 


c(c 
to find 


ave ty; yx} = Z- (> YrsYxs + 6a Yrs Y x1) 
cC 4 


— JL 


— ald cov {yr »Y¥«s} +e oa } > © cov {yrs ’ Yet} |. 


C—-1 TAL 


32 Y 55 Yar.+ ro»? Y.,Ya; 


JL 


i 
C 
and ave {yz} = Y,, ave {yx} = Yx«, and reduce, we have 
1 1 1 , 
cov {yr,¥x} = ( nm *) a (> Yr; Yun, — CY1Y«) 
(1 c crc = 3s 
) 


1 
+ rai P cov {Yrz , Yxs} += C1 7 o> cov (vss + vat! |. 


This is the key result. 
If we sum this over J and K with J * K and reduce, using 


220 Va Yur + a Yi, = RZ}, 


Ik 


we find 


2D cov {ys yx! 
= -(-2) 45 1rx ¥n-CDLY¥i- RD Z; + CRY’ 


Cc 
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+ LEE Leow! Urs Yas + | oo > >>» > > cov (vss vex} | 


lk J 14K J#L 


= (: _ ty KR - L)oint m. R(R Pe 1) o%01] 
c 4 


+ RR yet = De 


If we go back to (1), and put K = J, we have 


ver {ys} = (4-2)-4 1D vi - evil 


C7 ¢ 


+ A [E var tout + 4 SE cov tus an | 


JL 


and summing over J, we find 
Y var tud =($- 4) 4S Eve-cd vi 
7 c crc =—i'y 7 
+a 1 | DE var | Yrs} + {> LL ev ives» vet | 


JL 


pi + (c — 1) p2 


we Lint + Ro?,1] + R ; 


l 
= Ff e - 4) (1 - 4 oint + € me 1) of Tool bit i” + (c = De) | 


When we recall that the relation of y to y; and Y, is entirely analogous to that 
of y; to yy, and Y,, , we see that we can paraphrase (3) to give 


l 1 l y 72 

~ a) aT Ft Yt 
1 — | 

+ | D var tush +E 4 EE cov tues et | 
rR ‘7 R— 1‘Tyx 


mF l bed) oy a 2 2 
ee ,) Crow T Al (2 C ((R L)oint + Rov! 


+ petite - Ve 
= Vir — lof — R(R — 1)e21] 


ps + (c — es] 


c 


bs ; R(R — 1) 





JEROME CORNFIELD AND JOHN W. TUKEY 


aot? aS t 8% s Sars 2h 2 
= (2 fk) oboe + (2 1) om + (2 a h) of 
1 


re 


oe 


E + (c — Ip + (r — Ips + (r — 1)(e — De. 


We shall also be interested in the combination, 


1 : an --(1-2)e tsi. \a 
R> var {yr} var {y} aot (2 R row + a (2 CG Cint 
(6) 

= 


+ — fi a*e~ 1)(p2 — pu)], 


re 
which becomes, on adding 


zD (ave {y:})’ — (ave {y})’ = zlL Y; — RY’] = : : ee 


on corresponding sides, 


ave {i DL vi- i 


= € _ =) E ob (2 - 1) oint + “le — p+ (ec — 1)(m — pal]. 


When we observe that the exnression for )-, var {Z,} follows from that for 
>>, var {Y,} by symmetry (interchanging c with r, C with R, and “rows” with 
“eol’’), we see that we can easily evaluate another combination 


D > var {yrs} — CX) var {ys} — R DY var {z,} + RC var {y} 
I J J 


- co et OO ae ee a Se 
- no[[(1-'3( -2)- (1-2) (1-2) ] et 
1 1 
(1 -*)(1 - *) fos - p2 — n+ oil, 
? c 
which becomes, on adding 


X » (ave {y:,})” — C » (ave {y:}) — R x (ave {z,})? + RC(ave {y})’ 


(7) 


=D DL Yn-C) ¥i- RY Zi + RCY’ = (R— 1)(C — 1oin 
I J i J 
to corresponding sides, 


ave {Dv -Cdyi -RDG+ RCy’ 
ZI J I J 


9) 1 1 
=iC(1 -*)(1 — 1) fot +o — m2 — os + pul 
r c 
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These formulas will provide us with the desired results. For our limited purposes, 
they are the fundamentals of bisampling. 


24. The replicated two-way classification. In order to develop the formulas 
for the remaining average mean squares, we shall find it convenient to think 
of the sampling of n from N as taking place first, and taking place within each 
of the RC cells of the underlying array, and of the sampling of r rows from R 
and ¢ columns from C as taking place later. Taking this attitude, we can work 
with the means determined for each of the RC underlying cells after the sampling 
within cells and before the sampling of rows and columns. We take Y,, as the 
mean for the JJth cell. We then have 


ave {yn} = Yr = Xi, 


l' 1 
var {yrr} = (2 = i) ors, 


COV {Yrs 9 Yar} = 0, if Gi, J) cd (K, L). 


and 


mE 7 (X-- - qt = on, 


Trot = 4 a x (X_,- a uw * 


I J 


-}) Ede, =(b-f)e 


ae N 


dive = Reo > 2 (Xy- - Mins ~~ Mates. + sale = 


a (a 
a ©? RG 


pz = ps = m = O. 


The row mean square can be written in various forms, including 


= Phe on ae Dies’ ge 


r— 1-1 p= i?” r-1 
ry’). 


If we use first the independence of a; from y; and then (7) of the last section, the 
average value of this last form becomes 


ner sd ie 1_1)\ 2 

eave fs Luin = (1 +) [o% + (2 a)e 
+2(< -- ie | = neoe + n(a - et + (1 - a 
c N/* 7 cy nN] * 
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as we wished to show. The average value of the columns mean square follows by 
svmmetry. 

The interaction mean square can also be written in many forms, some of which 
are 


1p & 


Ne _ 


Ne =D (> > rij. — Cc Zz w..—F > aj. + rezx’..| 


n 2 2 : 2 
= nn [> ~ a,b, pra ~ € Zz a;y,;- 7 > b,2z; + rey |. 
(r — 1)(e — 1) “TS I J 

lf again we use the independence of a; and b, from each other and the othe 
quantities, and then (9) of the last section, the average value of the last form is 
seen to be 


= nor + (1 - Je 


as we wished to show. Thus we have the average values of the mean squares for 
the two-way pigeonhole model with replication. 


25. The unreplicated three-way classification. To deal with the three-way 
pigeonhole model, where r rows from FR, ¢ columns from C, and s slices from S 
are independently sampled, we start out to calculate the average mean squares 
for rows, columns, and their interaction just as for the replicated two-way. 
Difierences will first appear when we come to calculate p; , p2 , ps, and py, which 
is most simply done in an indirect way. 

We remark that, now, we have 


5 


‘< LD cxz 
a> “em * = CxXijx, 
S keel 


S ke-1 
ave |yn} = Yu = - 


where {cx} are a new set of indicator variables which specify the sampling of s 
slices from S. Now 


] v2 
var {Yu} = (+ -? - 5 , Dion — O218 | 
8 ; 
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since we have a sample mean of s from S, so that 
1 1 1 . 
Rn = © Y var tun) = (2-5) 54 EE Laie - SEL ah. 
ra 8 S/S—1°T ‘7 ‘x , a 
Moreover 


var (2 Ys} = (2 - 5) 3 a! (2 (d Zrsx)” = s(L 213-)'| 
-(2-3)s44 [C Date - Csxi__. 


Since >-, y,, is the mean of a sample of s from the S values >, 21x , and hence 
RCo + RCC — 1) = X var {DX ys} 
I 


_f Ee : . 
= (1-1) Gib dae - 8 Da. 
By symmetry, then 
RCo + R(R — 1)0ps = (+ - i) gq bate - 8 Ee 
and, indeed, 


RCp; + RC(C — 1p + R(R — 1)Cp, + R(R — 1)C(C — 1p 


wih Sm G-a)ea (ED © Laue) - (Daw) 
. (2 . 7 A (Sat - sxt_.]. 


We now have the basis for evaluating the p’s. 
We introduce the additional variance components for the three-way classifica- 
tion by 


— Sz2__], 
i Set —- RY 22-8 > a- + RSxr_]. 


ne 
1 we = 2 2 
Ccs = (C =1ns — =o [X x THJK c » a. S x BunJan + CSx* __], 


2 


and finally, 
ae 1 us Bh : 2 
hes = RoHS Sa Dy HY Ye REV ain CU yt. 
—-S>DYCOa_4+ RCD we + RSD 2 + CS D a}__ — RCSz? __], 
I J xK J I 
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In terms of these quantities we can now calculate the values of the p’s. We can 
conveniently use 


> Dan — SD ai__ = (R — 1)(S — lors + RID 22 — Sz*2__] 
I xK I x 


= (R — 1)(S — loess + R(S — 1) 8, 
and, by symmetry, 
Dd ec - SD = (C — 10S — lots + C(S — 103, 
J K J 
and, by a similar though longer calculation, 


p ie >» 2, tess —_= S 7 7 x, - = (R - 1)(C sai LS — Dismes 
I J x i J 


+ R(C — 1)(S — lots + (R — 1)C(S — lL)oes + RC(S — 1)e3. 


Substituting these into the earlier formulas, and removing factors where con- 
venient, yields 


sii ee 1 _ 1\/ - l 2 as L 2 i L\ 2 2 
a = ( ( ae *) Ores + (1 tots + (1 3) eke + at| ’ 
+(C-e =(1 -4)e} (1-4) eis + 03 
at(C—-e=\-— sje oe 
+ (R — 1)ps = '-2)R LG) ove + 03 
Pi . ps . 5 C cs os |» 


pit (C— 1p + ( — Vo + (© — 108 = Dp = (4 — 5) Rod. 


From these we could compute all the p’s, but it is simpier to find the combina- 
tions which will be of most use to us. These are 


l l l 9° 
nao (3 G-ater a] 
1 
aA ao € > dG = ) ones + che |, 


whence 


a 2 + (c — 1)(p2 = ps) = ( l— ‘) ores + che |, 


1 1 2 
n-m—-mta=(' — 4) Iobcal 
8 S 


A 


We are now ready to return to the evaluations of Section 23, where we found 
the average mean squares for rows to be 


neor +7 (1 = ) a; + npr — ps + (c — 1)(p2 — p)). 


4 
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We have now to write s in place of n, aze in place of o;, and to substitute in the 
value we have just found for quantity in brackets. We obtain, then, for the 
average value of a row mean square in a pigeonhole three-way 


fs c 2 § 2 c s 2 
CSOR T (1 a “) ore + (1 > ‘) ors + (1 = “N01 = ‘ orcs: 


The average values of column and slice mean squares now follow by symmetry. 
We had obtained 


no} + n| py ee io pa) 


for the average value of the interaction mean square. Making the necessary 
changes, the average value of the RC-interaction mean square for the three-way 


pigeonhole becomes 
oe oa 
So0rc T 1 = y ORCS; 


and the average values for RS and CS interactions follow by symmetry. There 
remains the triple interaction. 


26. The triple interaction. To get a simple hold on the triple interaction, we 
will find it convenient to introduce some differencing operators with the follow- 
ing definitions: 


eer 7 
de Si,5,4) = (SG, K) — $05.1), 


1 
Air fl, J, K) = yy (GU, J, K) — fl’, J, K)). 


and similarly for other indices. The usefulness of these operators stems from the 
following chain of representations for interaction sums of squares: 


Dah. st. = DD Owe), 


Dry. —r Qa). —e Lah. + reat. = == Do De De (Bis Byye Bis.) 
i 3 j i ‘ od ae 

EE Ler Lec L Eats LD tre Date 

‘ j k j k : ‘ 7 


9 1 
+ rs > Sa a= 65 > x. = res’. = — > = > Zp » > (Sy B55 bxx’ ist)’ 
j i ¢ rs :. 2 . 


and so on. 
To establish these representations we have only to prove (as we may by direct 
expansion), that 


2. u, — me = = aa ao (Sane Ua)” 
mh ih’ 


h=l 


ang Pay 
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and apply induction, writing, for example, the second interaction in the forms 


Sah. — hs] — Dah. - ret] 
> E Ez (25. ~ e|} EE bw x.) | 
=> : [XS (ie 265)? — el6ie 2.) 
EE [EE Orders], 


where we have used the identity twice and the fact that the mean value of 
5;%4;. Over 7 is 6;,2;.. itself. 
These representations lead at once to average value formulas. Thus, in the 
one-way case 
( 


( \ 
ave {> ai ss rz°| = ave \; X » (ow 2) = ave ‘; » » ary (Anr 1)" 


r 


r(r — 1) r— l . 
>> RRND R(R — 1) 1) (An a1)" -(% > SS) R X » (Ary 21) 


r—l1 2 >.2 
a [X Zr R 2), 
where we have used the fact that a;a, equals r(r — 1)/R(R — 1) except when 


I = I’, and the fact that we can neglect J = I’ because 6;,2; vanishes for 
I = I’. Similarly, in a two-way case 


ave {> D~t—r> 2 —c)d 2 + rex..)'} 
i i j 7 


= ave ‘2 x a DD (Bs by 4)*\ 
~~ Ss eC USE ) 
~ ave (i » x » a,a, b,; by (Arr Ags 21) } 


_— r(r — 1) 4 
TC R(R — 1) C(C — I) 5 x X » (An Ass 213) 


we — 1)(e — 1) ‘ . : : 
— 1)(C — 1) [x dai “2 » ty—C » x; + RCz~_] 


and so on. The next case, for the three-way case, shows us that the average value 
of the RC'S-interaction is exactly ozcs as we wished to prove. 

Clearly the extensions of this argument to the four-way, five-way, etc., classi- 
fications will always give a similar answer for the lowest interaction (the one 
involving all the indices!) for any factorial. 
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27. Further generalization. It is just a matter of moderately stiff algebra 
to carry on an inductive proof of similar results for more classifications. If 
ave {yrx} = Yrsx , and if g({y;}) is a quadratic in the observed y;;, , then 


ave {q({yise}) = ave {q({Vise})} 
+ (a linear form in variances and covariances of the (yij) 


Now the capital Y’s correspond to an analysis without replication, so that, for 
the q({ Yij}) we are interested in, the averages on the right can be calculated 
from the results of the last section. Because of symmetry, the linear form in 
variances and covariances will be a linear combination of 8 quantities, the mean 
variance, and seven mean covariances. Working these expressions out, and then 
calculating the values of these mean variance and covariances in special cases, we 
can obtain the average values of mean squares for the replicated three-way and 
unreplicated four-way designs. 

After this we are ready for another step of the induction, and so on. 

Clearly systematic algebra can take us deep into the forest of notation. But 
the detailed manipulation will, sooner or later, blot out any understanding we 
may have started with. If there is a way of seeing some aspects of the final result 
more directly, then it will be worth while to seize it. 

There are a number of such ways involving special tools or devices of varying 
complexity. Since we have tried to keep the approach of this paper reasonably 
pedestrian (although indicator variables and the 6,;,, and A,;; may be regarded as 


the equivalent of roller skates!), we shall try to use the least special way that we 
know. 


28. More direct insight. Let us ask about the coefficient with which 


2 
TRCDEFGS 


appears in the average value of the mean square of the RCDE-interaction in an 
1l-way design with factors labelled R, C, D, E, F, G, H, J, K, L, S. More spe- 
cifically, let us ask how the coefficient depends on the values of s and S. It will 
appear that we can answer this rather directly. 

There will exist some formulas which make up the fundamentals of deci- 
sampling. Apply them to the 10-way classification involving R, C, D, E, F, G, H, 
J, K, L—involving all the classifications but S. They will give the average value 
of the RCDE-interaction in terms of o”’s whose indices do not include S and of a 
mean variance and 2 — 1 = 1023 mean covariances. The latter 1024 quantities 
will be expressable in terms of the o”’s which involve S by a process entirely similar 
(though more complicated in detail) than that used in Section 15, each of the 
1024 formulas for a linear combination of mean variance and covariances will 
involve a factor 

5 ;) 
s Ss)’ 


since, in every case, we shall be sampling s out of S. Hence, when all the algebraic 
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dust has quieted down, the coefficient of ccpxr¢s Will include the same factor, and 
will depend in no other way on S. It will depend on s in a further way, since find- 
ing the row, column, D-variable, and E-variable will still leave us with fghjkls 
different cells. Thus a factor of this value will appear in the mean square for 
RCDE when that mean square is written out in terms of means (not totals). Thus 
the complete dependence on s and S will be 


1 1 8 
(ie Spel. 
(2 ) S 


With this result, and the result, obtained incidentally, that h and H will enter 
only through a factor h, we are essentially finished. 

By symmetry, we see that the term in the mean-value of the RCDE-interac- 
tion with which we are concerned is 


(constant) (1 ~ r)( _ £) hjkl ¢ _ < \oheosres 


If we simplify matters by choosing h = H=j=J=k=K=U1=L=1, 
the whole 11-way classification condenses to a 7-way classification and if, more- 
over, we revert to Model II (everything independent and normal), we obtain 


7 
(constant) crcperes 


as a term in the average value of the RCDE-mean square, and it is well known 
that the constant must now be unity. 
This provides a not-too-indirect proof for the rules set forth in Section 11. 
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SOME ASPECTS OF THE ANALYSIS OF FACTORIAL EXPERIMENTS 
IN A COMPLETELY RANDOMIZED DESIGN! 


By M. B. Wik? anp O. KemprTHorNe® 


Iowa State College and Princeton University; Iowa State College 


1. Introduction. This paper is concerned with some aspects of the statistical 
analysis of factorial experiments carried out according to a completely ran- 
domized design, and is one of the joint portions of an investigation into the role 
and meaning of linear statistical models in the analysis of randomized experi- 
ments. 

There are essentially two ways of obtaining the analysis of data obtained in a 
comparative experiment. One way, which is given in standard texts, is to write 
down a model of the type 


Yije--» = +a;+ b; + -e- etc., 


where ¥;;x... is the observation and the terms on the right-hand side are fixed 
unknown constants or random variables with specified properties. The above 
equation with a complete statement of all the properties of the quantities con- 
tained in it is usually called the model for the experiment. The texts and the 
literature are to the best of our knowledge, with a few exceptions to be mentioned 
later, bare with regard to how one determines the model, how one answers a 
question such as “Why not a multiplicative model?” or ‘““‘Why are the a’s fixed 
and the b’s random?” The other way is that practiced intuitively by many ex- 
perimental statisticians and described most aptly by Fisher (({3], [4], [5], [6]) in 
which (a) one envisages an analysis of variance of the observations from the 
point of view of topography, apart from treatment, such as for instance in a field 
experiment by rows, columns, plots within row-column cells, etc.; (b) one en- 
visages an analysis of variance by treatments; (c) one notes how the treatments 
have been assigned to the experimental material, such as, for instance, factor @ 
to rows; and (d) one therefore sees with which part of the topographical analysis 
any particular component of the treatment breakdown should be associated. 

The second procedure cannot be regarded as fully specified by what is said 
above. The first procedure can only be regarded as arbitrary unless some logical 
basis can be given for it. It is to the problem implied in the last sentence which 
we have addressed our work. 

In preparing this paper for publication we have had the benefit of specific 
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and general criticism and suggestions from Professor John Tukey, whose assist- 
ance and advice it is a pleasure to acknowledge. 


2. Relation to other work. The general history of the line of attack is given 
by Wilk and Kempthorne [15]. Since that time Smith [10], Scheffe [8], and Corn- 
field and Tukey [1] have worked on the general problems indicated above. Corn- 
field and Tukey [1] also discuss relations between approaches to the problem. 


3. Some fundamental concepts. The concepts implied in the words “‘treat- 
ment” (or ‘factor level’’),‘ “experimental unit”? and “true response” enter 
importantly into the developments in the sequel. We shall attempt to convey the 
general meaning which these terms have for us. 

While recognizing that the term treatment generally (operationally) designates 
a category of entities or operations, we shall use it as synonymous with “ideal 
treatment” or “‘typical treatment.’”’ An example of treatment as a category is a 
variety of, say, corn with operational representations as individual seeds, so that 
the treatment may be thought of as having a nested structure. The conception of 
a treatment such as “a temperature of 45°C” is often different. Even if tempera- 
ture control is difficult, so that in an actual trial one uses (45° + €)C with « 
unknown, one usually feels that it is reasonable to conceive, at least on a macro- 
scopic scale, of a ‘true or ideal treatment’ of 45°C, in the attainment of which 
we are frustrated by physical difficulties. 

In most cases it is useful to introduce explicitly the notion of a “treatment 
error’ which will reflect the difficulty in attaining or reproducing a conceptually 
meaningful ideal. In this paper we shall take such a view. The case when the 
treatment should properly be regarded as a member of a well-defined population 
will be given in a later paper. 

A reasonable operational definition of experimental units, though circular to 
some extent, is “those entities in an experiment to which treatments are assigned 
at random.” It is often possible and useful to think of experimental units as 
physical entities such as plots of land or individual animals, but in many cases 
such a view is misleadingly naive. Extensions of the term to include periods of 
time, states of mind, and other ill-defined complexes of conditions are needed. 
In an agronomic experiment we would regard the unit not simply as a plot of 
ground, but rather as the plot plus weather and other conditions not subject to 
test. In specific instances such a view involving “ultimate identification” of 
experimental units may be too restrictive and could be meaningfully and usefully 
relaxed. In the formal developments in the sequel, we shall be operationally 
deterministic in that we shall regard an experimental unit to be conceptually 
entirely identified so that a given stimulus would produce a definite response. 
This should not be construed to mean that every situation must be fitted exactly 
into such a context for the analysis to be useful. 


‘In general a “‘treatment”’ is partially specified by a “factor level.’? However, most of 
our remarks can be read substituting ‘‘factor level’ for ‘‘treatment.”’ 
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The notion of “true” or “typical” response seems readily meaningful at least 
superficially, and deeper analysis immediately involves one in philosophical 
discussion which is unnecessary in most experimental contexts. 

As regards “experimental error’ it may be useful to distinguish between 
“physical errors” and “sampling errors”; and in the first category to distinguish 
the experimenter’s concern with “systematic errors’ from the statistician’s 
treatment which usually revolves around an assumption of “random errors.” 
Some obvious categories of physical errors, with respect to subjecting a given 
experimental unit to a given treatment and observing the response, are errors 
of measurement of the response, errors of treatment application, and, from some 
points of view, errors dependent on the “physical state’ of the experimental 
unit. As we shall see, certain sampling errors can be controlled, in a statistical 
sense, by the device of randomization. In the analysis of other errors the statis- 
tician and experimenter must rely on judicious assumption based on insight and 
experience. 


4. The experimental situation and design; basic notation. The essence of the 
completely randomized design is that no attempt is made to structure the ex- 
perimental units; or from another, more accurate viewpoint, no restrictions are 
imposed in the random assignment of treatments to available experimental] 
units. 

We shall describe in detail a situation in which treatment combinations of 
interest may be classified according to the “levels” of three ‘“‘factors.”’ This will 
provide enough generality to indicate extension of the methods and results. 
The case of two factors can be obtained formally by considering one factor to 
have only one level. 

The factors (e.g., temperature, varieties, types of acid, etc.) will be denoted 
by script letters @, ®, €. The number of levels of each factor, in the experimental] 
population, will be denoted by the corresponding capital letters A, B, C. We 
suppose, for purpose of reference only, that the levels of each factor are ordered 
(arbitrarily) and let i = 1,2,---,A;j = 1,2,---,B;k = 1,2,--- ,C, denote 
the various levels in the populations of levels of factors @, ® and ©, respectively. 

Suppose there are P experimental units with respect to which we wish to 
study comparatively the various treatment combinations. (The details of what 
we may be interested in doing will vary with the specific physical situation, but 
some general statistical aspects of what the bare situation and design enable us 
to do remain the same.) Again we suppose, for formal reference, that the units are 
ordered, and let m = 1, 2, --- , P denote the unit in the population of units. 

The experimental design is now defined as follows: 

(i) Select a levels from A of factor @ at random. 
(ii) Select b levels from B of factor ® at random. 

(iii) Select c levels from C of factor € at random. 

(We will use the notation 7* = 1, 2, --- ,a;j* = 1,2,---,b;k* = 1,2,---,¢ 
to denote the randomly selected levels of @, @, and @ respectively in order of 
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their selection, Thus, for example, 1* = 1 corresponds with probability 1/A 
to any designated value of 7.) 


(iv) Select p experimental units at random from P, where 


a b ¢e 
p= m du z Niejexe , and all njejox* 2 1. 
Sn, fi te 
The values of the nj« j+,« are treated as known prechosen fixed numbers. (Further 
explanation of this is given at the end of this section.) 

(v) Apply selected factor levels to selected experimental units at random but 
so that selected treatment combination (7*j*k*) appears on 7; ;+,« of the selected 
experimental units. 

Some purely formal difficulties can arise in this general exposition for the case 
of, say, a = A. According to our description above, the identification of levels of 
@ by 7* would be a random arrangement of that effected by 7. In dealing with 
symmetric functions, clearly no difficulties arise. The whole matter can be 
handled simply by a convention that, for example, when A = a we take 7 and 7* 
to be identical indices; or, where non-symmetric functions arise, it can be handled 
by an extended notation, as will be seen in the sequel. 

It is most natural to think of the design as being imposed upon given back- 
ground populations of levels of factors and of experimental units, but it should 
be pointed out that it is in fact our procedure in the design which determines the 
relevant (statistically) population of treatments and units to which our experi- 
ment applies. Some further discussion of consequences of this point is given below. 

The description above is intended to be general. Cases of fixed, mixed, and 
random model situations are included as special cases. The possibility of equal, 
proportional, or unequal numbers in the “subclasses” of the observations is 
allowed for. In the described set-up the number of observations associated with a 
treatment combination depends on the actual realization of the experiment, 
that is, on the outcome of the random selections, and not, in general, on the 
population of treatments. An important exception to this is the case of fixed 
factors. ‘1 «us, we specify that the selected treatment combination (7*j*k*) appear 
nj+ jx» times; but, in general, the association of (7*j*k*) with values of (ijk) will 
depend on the random selection process. 


5. Some discussion. In the formal description of the experimental situation 
and design in the preceding section, the role of experimental units in the experi- 
mental] situation and the relation of the sample of units to the population are 
specified explicitly. 

The population of units defines, in a sense, our experimental milieu or back- 
ground. Even if all units can be thought of as identical (a rare event) many back- 
ground influences (not under direct study) are being “held constant.” For ex- 
ample, 10 cc samples from a well-mixed, non-volatile solution may well be con- 
sidered (aside from pipetting errors) essentially identical. But if, in a two-factor 
experiment, one factor consists of levels of concentration of a reagent and another 
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is time allowed for reaction, then it is part of the relevant background for in- 
ference, tied up with the conception of experimental unit, to describe (or at least 
keep in mind the existence of) such influences as ambient temperature, baro- 
metric pressure, type and shape of container, etc. Thus our inferences must 
always be interpreted with respect to some population of experimental units, 
even though in specific instances we may be quite certain of the absence of in- 
fluence of certain aspects of the experimental milieu. 

The emphasis on the relation of sample to population is a fundamental con- 
tribution of procedures of modern statistical inference toward scientific objec- 
tivity. In spite of the wide acceptance which, we believe, the preceding sentence 
would find there appears to be some tendency among statisticians to think of 
the population to which statistical inferences are to be made to be not that from 
which the random sample is obtained but rather one which is indicated by their 
“interest.”” The key to this difficulty may lie in the failure to recognize any 
distinctions between “empirical inferences’? based on statistical techniques and 
“scientific inferences” based on theories of mechanism, mechanical analogies, 
intuition, etc., in addition to statistical inferences. 

For example, in a two-factor experiment involving specific insecticides tested 
with respect to a random selection of 15 types of insects from a population of 200 
types of insects we would recognize the statistical validity of two viewpoints in 
evaluating the comparative utility of the insecticides: (i) relative to the entire 
population of 200 types of insects from which we have a random sample; (ii) 
relative to the 15 types of insects actually tested (i.e., the ones which appeared in 
our random selection). There does not appear to be any general justification in 
attempting, on the basis of data relating to 15 non-randomly selected types of 
insects (as, for example, those prominent in a certain region), to extend the 
statistical (empirical) inference to some broader, undefined, population of insect 
types. There can be no question as to the need or importance of making such an 
extension, but such extension is essentially non-statistical and must be based on 
subject matter knowledge and intuition. 


6. A conceptual framework for analyses; the population model. In the pre- 
vious sections we have described an experimental situation and procedure which, 
at least formally (and granted agreement on the meaning and necessary proced- 
ure implied by “random’’), is non-controversial. We propose now to provide a 
conceptual framework for the statistical analysis. This will naturally require 
some assumptions, all of which we will attempt to make elementary, in the sense 
that their implications are easily appreciated, and explicit. 

We postulate the existence of a real (unknown number Y;;.», which represents 
the “true” (or “‘typical’’) response if unit m is subjected to the treatment com- 
bination consisting of the 7th level of @, jth level of @, and kth level of C; and we 
take as our immediate framework of statistical concern the conceptual set { Y ijem}, 
and more particularly certain functions defined on the elements of this set. 
Several presumptions are implicit in the preceding sentences. First, the scale 
of observation is considered as “given,” though our succeeding discussion could 


} 
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proceed equally well in terms of any function of Y. This is not to imply that any 
scale for analysis is as informative as any other. Second, for the quantity Y jem 
to have meaning by itself it is a necessary assumption that the response from 
given treatment on given unit be dependent only on that treatment and unit 
alone, and not on the over-all configuration of other treatments and other units; 
this excludes certain experiments such as those involving competition in animal- 
feeding trails. Third, we assume that the notion of “true’’ or “‘typical’’ response 
can be given an objective meaning in the given situation. 

Proceeding now on the basis of the previous paragraph, we know that if we 
actually subjected unit m to factor combination (ijk), we would not in general 
observe the true response, Y;jzm , Owing to inevitable errors in treatment appli- 
cation, in response measurement, and variations for a given unit owing to its 
“physical state”. These types of errors we refer to as technical errors. These 
technical errors have no relation to the formal randomization procedure but 
belong to the conceptual framework. Consequently, in a general study of this 
sort we have three alternatives with respect to technical errors: (i) To deal with 
the “ideal” case where such errors are not considered, with the understanding 
that the application of the method and results in specific situations would require 
some extensions, depending on “reasonable” assumptions in the specific case. 
(ii) To employ simple assumptions, which are popular, easily understood, and 
often reasonable, again with the understanding that adjustment may be neces- 
sary to meet specific situations. (iii) To attempt to carry technical errors with 


some sort of “maximum generality.”’ Procedure (ii) appeared to us to be the most 
useful. 


Accordingly, we will assume that if combination (ijk) were applied to unit m, 
then we would observe 


Yijkm = Y ijem + €ijkm ’ 


where the €ijxm, representative of combined technical errors, can be treated as 
random variables which are mutually uncorrelated with mean 0 and common 
variance o’. 

Some directions of increasing generality of assumptions would be (i) relaxing 
the homogeneity assumptions to, say, variance (€ijzm) = o» ; (ii) relaxing the 
homogeneity assumptions to, say, variance (€:jem) = a7 ; (iii) Yisem follows some 
distribution F;jzm(y) of which Y;jem is some parameter. It is easy to see that the 
results we shall give are in fact essentially valid if generalization (i) above is 
permitted; we have not built it in explicitly to simplify the presentation and 
lay clear some aspects of the results. Furthermore, the results on ems (expecta- 
tion of mean squares) are essentially valid if generalization (ii) is allowed. 

Anticipating its utility in the succeeding section we can now write down the 
population model as 


Yisem = w+ a; + 0; + ce + (ab)ij + (ac) + (be) 
+ (abc) ijn + Dm + Giinm + €ijnm - 


No further assumptions are involved in this decomposition, which is based on 
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an algebraic identity involving means and deviations over the array { ¥ijxm}. 
The explicit definitions and physical interpretations of the components of popu- 
lation model are delayed to Section 12 below. We note here that while the detail 
of the population model depends only on the experimental situation, the specific 
breakdown which we employ is determined by the design, since it will turn out 
that certain of the components of the model are estimable. 


7. The statistical model; the function of randomization. We turn now to a 
consideration of the actual experimental observables. Let 2; j++; denote the 
fth replicate observation obtained from selected factor combination (7*j*k*), 
where f = 1, 2, --- , nijexe. Since x+jx*7 is Obtained from some one experi- 
mental unit, each (7*j*k*f) corresponds to some value of m, the experimental 
unit index. Against the background of the previous section we may regard the 
statistical effect of our experiment as giving a random (within the well-defined 
restrictions of the experimental design) sample, the {jj}, from the set of 
random variables {y; jm}; i.e., a restricted random sample of size >> ¢« jexe Nee jens 
from the ABCP populations specified by the random variables {y; jm}. 

It is appropriate to discuss here the function of randomization in this experi- 
mental design. Clearly, if we could observe the entire set {Yijxm}, we would 
know everything (empirically) possible about the experimental situation under 
consideration. Alternatively, if we could obtain observations on each member 
of the set {y:jcm}, then only the technical errors { €;j..} would be involved in our 
inferences about functions defined on elements of the set { Yim}. However, we 
are in general able to observe only a subset of the {y;j.}, and hence our infer- 
ences will be influenced by additional variabilities. The function of randomiza- 
tion is to attempt to control, in a statistical sense, these additional variabilities, 
and to enable us, perhaps, to obtain valid estimates of the uncertainties of in- 
ferences. 


We incorporate the restrictions of the experimental design with the popula- 
tion model to obtain a statistical model for the observations, {2 j+:«;}, in terms 
of parameters defined on elements of the set {Y;,,} and of random variables 
which reflect (and define) the restrictions of the design. This statistical model 
has the advantage that it, together with the properties of its components, sum- 
marizes sufficiently all the relevant statistical knowledge and assumptions for 
the experiment. In addition certain results on linear estimation, variances of 
estimates, and expectations of analysis of variance mean squares may be de- 
rived by elementary algebraic operations using the statistical models. Further- 
more there would be nothing more difficult than heavy algebra involved in 
obtaining more complex results, such as variances of mean squares, using the 
statistical model. It is to be expected, however, that more purely combinatorial 
arguments will shorten the process with regard to particular attributes (cf. 
Tukey [11] and Hooke [7]). 

Full detail on the necessary additional notation and definitions needed to 
write down the statistical model is delayed till Section 12. At this point we note 
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that the statistical model takes the form 
Liv jeney = w+ Aje + djs + che + (ab)zeje + (ac)iens 
+ (be) jens + (abc) je jane + Die jeney + Gee soars + eco jones ; 


where, for example, a} = >>, ai’a;, the a; being parameters from the popu- 
lation model, the a}” being random variables which take on values zero or one 
with joint probability distribution specified by the experimental design. (In 
particular, for a;- the relevant item in the design is the random selection of a 
levels of factor @ from the population of A levels.) 

It is apparent from the subscripts in the above model that the last three com- 
ponents are mutually confounded, but their separation in the model is of im- 
portance because their statistical properties and experimental content are not 
alike. 

The formal resemblance of the above statistical model (which may be appro- 
priately called a definitional type model) to the usual ‘‘assumed linear models 
said to underly the analysis of variance’’ will be apparent and is not fortuitous. 
We note for emphasis that the model above depends only on the assumptions 
given in Section 6 above and not on any detailed knowledge or assumption con- 
cerning the mechanism (behaviour) of the experimental factors or units, 

An extension of the application of the statistical model which we shall con- 
sider in this paper only very superficially would be to deduce certain elementary 
properties of the terms on the right-hand side (e.g., means and variances) and 
employ these with sufficient homogeneity and distributional assumptions to 
suggest a modified mathematical model which is more tractable from the point 
of view of ‘‘exact” distribution theory (cf. Scheffé [8)). 


8. Succeeding sections. We invert the logical order of development by giving, 
in succeeding sections, results on expectations of analysis of variance mean 
squares (ems) in advance of definitions, notation, and derivations underlying 
these results. This is done because many who may be interested in the structure 
of these results will have much less concern with the detail of their derivation. 

In Section 9 we deal with the case of proportional numbers (defined below) 
and on orthogonal’ analysis of variance based on weighted cell means; in Section 
10 we consider the case of general numbers and a nonorthogonal analysis based 
on unweighted cell means; Section 11 deals with the special case of one factor. 
In addition to general formulae for expectations of mean squares, some questions 
of estimability of components of variation and of “proper error terms” are taken 
up. 

In Section 12 we give details concerning the population and statistical models, 
explicit definitions of the components of variation, an example of the use of the 
statistical model in deriving ems, and discussion of various complements such 

§ We use this term to refer to a decomposition in which the individual sums of squares 


sum to the so-called total sum of squares. 
* Expectations of mean squares. 
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as the physical interpretation of the parameters of the population model, rela- 
tion of non-additivity to scale of observation, etc. 

In Section 13 we describe briefly a more symmetric form for the results on 
ems which makes the extension to four or more factors very simple indeed. (This 
general pattern has been extended to include other experimental designs and its 
over-all structure will be described in later communications.) 

Section 14 deals illustratively with problems of linear estimations, errors of 
estimates, and estimation of these errors, using the statistical model for these 
considerations. 

Certain problems connected with the different roles of fixed and random factors 
and the need for functional structure analysis in the former are discussed by 
Wilk and Kempthorne [15] and will not be treated here. 


9. The case of three factors, proportional numbers, no additivity assumptions. 
We present in this section results on expectations of analysis of variance mean 
squares (henceforth referred to as ems) for the experimental situation and design 
given in Section 4, employing the conceptional framework described in Section 
6, under the restriction that the number of observations in the subclasses fulfill 
the condition that 


Nes jee = TUjseV jeWee , 


where r is the highest common factor of the {nj j+*}. Such a condition is often 
known as that of “proportional numbers.” 

Under these conditions an orthogonal analysis of variance, based on weighted 
means, exists. A case of “proportional numbers” can arise quite naturally when 
there are unequal numbers of observations corresponding to only one factor of 
classification. 

The algebraic structure of the mean squares for such an analysis is well known; 
for example, 


1 


ie ay BE 
(a — 1) ivfrhey 


(xis... = a 
1 
(a — 1)(b — 1) Pie 


where the usual dot convention is used to denote means. 
We shall have use for the following notation: 


U => us; V => oy; W = > wo; 
g° j* ke 


Kn = (xjoj0.. — Lie... — L.js.. + z....)*, 


Ut => u/U; Vt =d%/V’; Wt = >d w/w’. 
* j* ke 


(Note that for the case of equal numbers U = a, V = b, W = c, U* = 1/a, 
V* = 1/b, W* = 1/c. Of course, in general, U* S 1, U* = 1/a.) Employing 
this notation, and recalling that f has range 1, 2, --- , rujvj+wye , we obtain 
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A®* = — r-VW » Uje(Tye... —_- Zu..+), 


Iie = GoDELT rw du Vj0(Xjoje.. — Lie... — Le je. + n....)° 

General results on ems for this analysis are given in Table 1. The definitions 
of the components of variation’ which appear in the table are given in Section 12. 
For all the o”’s and Q”’s with the exception of 0} the definition is such that they 
are a sum of squares of quantities divided by the number of linearly independent 
relations among these quantities. The subscript notation is intended to be sug- 
gestive; for example, o2 is a measure of dispersion of the population parameters 
{a;} which are the “main effects” of the levels of @; o reflects the dispersion 
of the population of interactions of levels of @ with levels of ®; Qi, reflects the 
dispersion of the interactions of levels of A with experimental units; etc. (See 
Section 12 for further detail.) The definition of oj requires a little comment. 
It is defined as 


2 1 2 
%« “ ABC(P — 1) 2, 7 
while the number of linearly independent relations among the set |qijkm} is 
(ABC — 1) (P — 1). The reason for this definition is partly because o. appears 
in the ems for the residual and partly to simplify the formulae in Table 1. (Later, 
when we put Table 1 in a more symmetric form in Section 13, this disturbance 
will be eliminated.) The only distinction between the Q”’s and the o’”’s is that 
the former all reflect interactions of treatments with experimental units. The 
distinctive notation was employed to make this readily apparent in the table. 
The results of Table 1 indicate that, in general, unbiased estimates of 03, 03, 
o2, gab , ete., cannot be obtained from the analysis of variance mean squares if 
unit-treatment interactions are not negligible. The corresponding statement for 
the appropriate denominator in a test of significance criterion is complicated by 
possible ambiguity with respect to the null hypothesis of concern. But it is appar- 
ent that in a test of significance concerning, for example, the main effects of 
levels of @ (see definitions of Section 12), we cannot in general find a ‘“‘denomi- 
nator’ whose expectation is 


o | se Oo ¥") :) 
B(A wr eon ° 


The question may arise as to whether it is in fact components such as o2 


7 We refer to these quantities as ‘‘components of variation’’ rather than as “‘components 
of variance’’ to avoid possible ambiguity, since they are in fact measures of dispersion for 
the population of quantities on which they are defined, and are not, in the usual meaning of 
the word, variances of random variables. 

8 This ‘‘bias”’ in the analysis of variance is the generalization of a similar result for a 
simpler situation, given by Wilk [12], [13]. 
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TABLE 1 
Expectations of mean squares for orthogonal analysis of variance 





Mean | 


Expectation 
squares | 


| a= 09 f 
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safe fe}olr=) 
+(n ps E als ve-5 


rx) ( 
| o+o,t+o,+ rovw SF) (ue 


- o+o,+o0,+rUVW 


(6 — 1) 


+ (ws — 5)| o% - } Ger | + +(U - + 


a (1 — W*) 1\/,, 
| oe +o,+0,+rUVW ae =F {(u" - a *_ 


or Be- baer a 


le+ o; o, + rUVW (i — U*)( — V*) 


(a — 1)(b — 1) 


( 
° {(ws - 4)| oi ae F Oise | + E 


(1 — U*)(1 — W*) 
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|o +o,+0, +rUuvVwW ee ee 


aa — V*)(i — W*) 
(b — 1)(c — 1) 


: (us _ 4) oie — p Gis | + +| oi = 


Qa — U1 — Vi — a 2 = ] 
ee ee ee she — =O cp 
¢@ +o,+0, +rUVW a> hoe 1 Jab p Sores 


ot+o,+o,+rUVM 


| 
2 2 2 
BR o +o,+ op 


which are of interest rather than the linear combination [02 — (1/P)Qé»)- 
The answer to this lies in an examination of the quantities {a;} whose dispersion 
make up oz. By definition a; = Y,... — Y.... and is thus the deviation of the 
average ‘true’ response from level 7 of @, in combination with all levels of all 
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other factors and all experimental units, from the over-all average from all 
levels of all factors on all units. We refer to a; as the main effect of the ith level 
of @. The difference between two such main effects a; — a, measures the dif- 
ference (averaged over all levels of all other factors and all units) between operat- 
ing at level z of @ and level 7’ of @. On the other hand the combination 


[os — (1/P)Qzp] 
is not always necessarily positive (though it will be in most cases of practical 


TABLE 2 


Error terms 





Classification | Error terms 


al ty (v: — ) Ute - R*) 


+; e- (ws - 1) Ute - RY) 
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_(—1fe-1) (xy. a) a) *) —. 
a—Vvraaa ws \" wr wn y lane ad" 
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(a — 1) Tk ) i. 

(i — ~— U*) e (Tac R ) 

B= 1) ( 1 

a - oa ( 


3) 
OUP Df ye ‘\ ve — tse - Re 


~ @— 040 — V4) 


(m= 
i) 

ea, (we - 1) Ute — R*) 
(u* - 


(Ibe 


A (ec — 1) — _ me 
axe R ra W*) (w 5) Use R*) 


. Pan (v*- 3) ae 
axe R* + os \' R) ave R*) 


- X . (a — 1) 1 fe 
@®xe | ‘BC R + oR, (ut - G) Wiae - B” 


Qx 6x e| Visc = R* 
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interest) and hence is not a measure of dispersion of any quantities defined on 
the basic population of true responses { Yijem}. 

Two factors tend to decrease the importance of this “bias” in the analysis of 
variance due to interactions of treatments with experimental units. First, the 
quantity confounded with the component of variation of interest enters in the 
ems with coefficient 1/P. Thus if P, the number of experimental units is large, 
then the effect of the confounded term will usually be small. The origin of the 
confounded term is the negative correlation induced on observed responses 
for a given treatment combination owing to the random assignment of units 
from a finite population. As the size of the population of units increases, this 
correlation goes to zero. Secondly, each Q’ quantity represents a higher order 
interaction term than the component with which it is associated, and it is often 
true that the higher the order of the interaction the smaller it will be. The size 
of unit treatment interactions depends somewhat independently on two con- 
siderations, namely, the scale of measurement of the responses and the hetero- 
geneity of the experimental units. Of course, homogeneous experimental units 
will mean additivity of units and treatments on any scale. 

Under the assumption that all unit-treatment interactions are zero (i.e., that 
dijkm = 0) so-called proper error terms would exist. Table 2 lists error terms for 
each classification of the design. The bias in using these error terms when unit- 
treatment interactions are not negligible is exemplified by 


[—rUVW(1 — U*) / (a — 1)(1/P)Q2,], 


which is the bias in using V, as an error term for @ main effects. 


As we shall see in a later section, the device of randomization is fully effective 
in allowing unbiased linear estimation of treatment effects. But essentially un- 
biased error terms will be obtainable from the analysis of variance, in general, 
only when the experimental units are not too heterogeneous or the size, P, of 
the population of units is large, or the scale is such that units and treatment 
combinations are additive (in the sense that their interactions on that scale are 
zero.) There does not appear to be any simple statistical method to overcome 
this confounding which is due to the “fractional replication” which is imposed 
by the restriction that each unit can be ‘‘used only once.” 

We close this section with a discussion of three special cases which have been 
given much attention in the past. For simplicity we reduce our consideration 
to those involving two factors, @ and ®, putting C = c = 1, and shall take P 
as “‘very large.” The cases we detail are the so-called “fixed,” “mixed,” and 
“random” model situations. The results on ems are then those of Table 3: 
o5 = oD + o7. 

The following points from Table 3 are worthy of note: If the numbers of ob- 
servations in each “‘cell’’ are equal, then U* = 1/a and V* = 1/b and then the 
component o% vanishes from E(A*) and from E(B*) for the fixed case; and 
from E(B*) in the mixed case, where @ is the “fixed” factor, but not from E(A*), 
where @ is the random factor. If the numbers are proportional and not equal 
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TABLE 3 
Ems for special cases of a two-factor experiment 





fean 


aquare | 1, Fixed: A=a,B=b | 2. Mixed: A=a,B>b 3. Random: A >a,B>b 


(-U*) (1 — U*) 
(a — 1) | oo + bie 5) —1) 


{( ve— 4 oa + oi] :[V*om + oa] 
3 = 


A* |o5+rUV Same as 2. 


a = 7%) 
(6 — 1) 


[(v t) om + “|| [Ute + oi] 
a 
(1—U*) | 
‘(@—1) 
=F) 2. 
(b — 1) 


Same as 1. oo, + rUV 


oo + rUV i 


a + rUV & Same as 1. 


kR* oo 


then, even for these special cases, we do not have simple comparability of the 
orthogonal analysis of variance mean squares, as has been pointed out by Smith 
[9]. The fact that, for the case of equal numbers in the mixed case, the compo- 
nent due to interaction remains associated with the fixed factor but not with 
the random factor is due, loosely speaking, to our having information on each 
observed ‘“‘random’’ factor level in combination with every fixed factor level in 
our population; but for each fixed factor level we have only a random selection 
from the possible random factor levels. The crucial point is that for the case 
@ fixed, ® random, o reflects the dispersion of effects of levels of @ averaged 
over all levels of @ and similarily for 0%; and while every level of @ is used in the 
experiment, only a sample of levels of @ are studied. 


10. The case of three factors, general numbers, no additivity assumptions. 
In the event that no restrictions are placed on the numbers n;+;,.- , except that 
they be non-zero, an orthogonal analysis of variance, in which the various sums 
of squares all have a meaningful relationship to the experimental situation, for a 
multiple factor experiment will not, in general, exist. One can, however, make an 
analysis of variance based on cell means. The algebraic structure of such an 
analysis is exemplified as follows: Let A** be the mean square associated with @ 
main effects in this analysis, and let 


> Lisjexe, 


C j*ke 
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i... = — - Lisjeee 5 
abe i*jrke 


= bc 2 (H+... — &....)°/(a — 1). 


The table is completed by a line for residual mean square, R**, which reflects 
within cell’’ deviations and is in fact identical with R* of Section 9. This analy- 
sis is not orthogonal in the sense that the individual sums of squares will not. 


in general, sum to the so-called total sum of squares, 


TABLE 4 
Expected mean squares | for non-orthogonal analysis of variance 
Mean : : 
Expectation of mean square 
square | 


(B zs b) BD — p Qaver 
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(Leiejeney - %....)°. 
i*j*kes 
The only exception to this last statement (when dealing with two or more fac- 
tors) is when the numbers n+ ;+;« are all equal. 
The statistical model appropriate here is identical with that used for Section 9 
and is developed in Section 12. Table 4 gives the ems for this analysis, with no 
additivity assumptions. We employ the notation 


4 = — ( : ) = average value of elements of the cet : }. 
- i*j*ke 


n Niejeke Niejeke 


Definitions of components of variation are the same as in Section 9 and are 
detailed in Section 12. 

The advantage attached to this analysis of variance is the simple structure 
of the expectations of mean squares, as opposed to the very complex relations 
exhibited in Table 1. In fact, if all mean squares in Table 4, except R**, are 
adjusted by multiplying by n* then, speaking rather loosely, this analysis may 
be superficially interpreted in a similar way to an analysis for a case with equal 
numbers in the cells. (For equal numbers, n* becomes simply r.) 

The discussion given in the preceding section in connection with difficulties 
when unit-treatment interactions are not negligible applies also to the non- 
orthogonal analysis. If unit treatment interactions are negligible, then one can 
obtain from linear combinations of the mean squares of the non-orthogonal 
analysis unbiased estimates of the various components of variation of interest. 
For example, with negligible unit treatment interactions an unbiased estimate 


2- . 
of oa is given by 


1 ,— 
The relation of this to the selection of appropriate “error terms” to serve as de- 
nominators in F-type comparisons of mean squares will be apparent. The rela- 
tion to the estimation of variances of linear estimates is no less immediate and 
is dealt with explicitly in Section 14. 

If one has a situation involving proportional but unequal numbers, the ques- 
tion arises whether one should employ the orthogonal analysis based on weighted 
means or the non-orthogonal analysis based on unweighted means. In the present 
state of knowledge it appears to be a matter of taste, convenience, and opinion 
as to which analysis is more advantageous. (Some recent relevant references on 
this point are Cox [2] and Tukey [11)). 

The non-orthogonal analysis has the advantages of wider generality, easier 
computations, simpler terms and more direct connection with the estimation 
of linear contrasts among treatment effects. Furthermore, speaking very loosely, 
the non-centrality enters into the mean squares of the non-orthogonal an:lysis 


in a more easily appreciated and more symmetric fashion than for the ortho- 
gonal analysis. 
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The questions of efficiency of estimation of components of variation and of 
sensitivity of significance, as regards these two analyses, are still open. 


11. One factor, general numbers. The case of two factors may be obtained as 
a special case of the three-factor development by putting C = c = 1. We will 
not deal with it explicitly. The case of one factor can be obtained by putting 
B = b = 1, in addition. Because of some peculiarities in this situation we give 
some brief discussion below. 

The one factor case corresponds to the within and between analysis of variance 
and has the associated property that an orthogonal analysis always exists what- 
ever the numbers of observations corresponding to the various levels tested. On 
the other hand one still has the choice as to whether to analyze weighted or un- 
weighted means of observation corresponding to the levels tested. 


The residual mean square is the same for both analyses. For the proportional 
analysis 


2 
(a—1)% when. ~ 8.) 


where 


ie. = and 
Nie Ff 
WA*) = ot + 1U aoe oe 
E(A ) oot+ rU (a yi 1) Ca P 
where o6 = o + op + oe ne = rue, U = > use, U* = er uje / U’. For 


the non-orthogonal analysis, 


or 7? Zz (zie. — &..), where Z.. = a ins 
= a ie 


1 eS il 1 
E(A**) = wate a. where “| =" 2. 


a ie Nise 


Thus, in the non-orthogonal analysis of variance equal weight is given to each 
observed level of the factor. In the case of a single factor there does not appear, 
offhand at least, to be any basis to suggest that one analysis will be, in general, 
superior to the other. 


12. Derivation of models and ems. Our attention is directed in this section to 
the following main items: (i) definitions and physical interpretations of the 
parameters of the population model; (ii) the explicit development of a formal 
statistical model for the observations; (iii) definitions for the various components 
of variation; (iv) illustration of the use of the statistical model in the derivation 
of ems. 

In Section 6 we gave a conceptional framework for the analysis of the general 
three-factor completely randomized experiment. This specified as the background 
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population a set of ABCP (unknown) numbers, { Yijem}, the “true” or “typical” 
responses. A useful and meaningful representation of these is the one implied 


by the definitions 

bu 
aj 
b; 
Ck 

(ab); = 

(ac)in = 

(bc) x = 

(abc)ijx = Vise. — Yiz.. — Yiu. + Vin. — (be) ix, 

Du = Y...m — by 

Qijem = Yijem — Yigx. — Y..em + me 


It is easy to check that the sum of all the components defined above is iden- 
t'cally equal to Yijem . We have now 


(1i+A+B+C+ AB+ AC + BC+ ABC + P + ABCP) 


quantities, in place of our original ABCP, but the following properties indicate 
the dependencies: 


0 = dX Gq = x b; = a Co = 2d (ab); = DN (ac)a = » (be) sx 
2 (abc) ie = a Pm = » diikm = a Qijkm - 


These relations follow by definition of the parameters and not by assumption. 

The quantities defined above can be given physical interpretation. We shall 
do this for representative cases: 

u is the “true” over-all conceptual response if all treatment combinations 
were applied to all experimental units. 

a; is the difference between the mean of the “true” responses if all treatments 
consisting of the ith level of @ in combination with every level of @ and every 
level of © were applied to all experimental units, and y; we refer to a; as the 
main effect or simply the effect of the ith level of @. It should be noted that 
a; — a, is the difference between the responses due to level i of @ and level 7’ 
of @ averaged over all levels of other factors and all units. 

(ab),; is the difference between the effect of the jth level of @ in combination 
with the ith level of @ and the main effect of the jth level of @. (The symmetry 
between @ and @ is obvious from the definition of (ab),;.) We call (ab),;; the 
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interaction effect, or simply the interaction, of the ith level of @ and the jth 
level of 8. 

Pm Measures the difference between the mean response from all combinations 
of levels of @, @, and @ on unit m compared to yu. Thus the p,, measure the 
(average) variability of units with respect to the treatment combinations. 
Because of the direction of our interest, we refer to the p,, as the additive unit 
errors or simply as unit errors. 

Jijkm iS similarly seen to represent the interaction of treatment combination 
(ijk) with unit m: and we refer to the q;;», as interactive unit errors or unit- 
treatment interactions. 

A number of items deserve explicit mention even though some have been 
indicated in the literature, specifically by Yates (16, 17, 18] and others. (1) The 
definitions of effects and interactions are relative to a given scale of measurement 
of response. Transformations of the scale would lead to radically different effects 
and interactions associated with the treatment population. To speak of the 
main effect of level 2 of factor @ as being large is meaningless unless a particular 
seale of response is implicit. Similarly the entire concept of interaction has 
meaning only relative to a given scale. Two factors can, with no contradiction, 
have negligible interactions on one scale and large interactions on another scale. 
(2) For a given scale of measurement of response, the definition (and inter- 
pretation) of, say, the effect of the 7th level of @ depends not only on all other 
levels of @ included in the experimental population but also on all levels of all 
other factors as well as on the relevant population of experimental units. The 
generalization to other effects and interactions is immediate. It is of interest and 
importance to note that the difference of the effects, say a; — aj, of two levels 
of @ becomes independent of other levels of @ under consideration but remains 
entirely dependent on the levels of the other factors and the population of ex- 
perimental units. (3) If we have a scale of observation such that, for instance, 
all interactions (ab);; are negligible, or @ and @ are additive, then the difference 
a; — a; becomes independent of which levels of ® are included in the study. 
This points up the enormous simplification in the summarization of relevant 
information and in understanding of the situation which is effected when we can 
operate on a scale in which interactions may be neglected. (4) If the levels of 
factor are essentially identical in terms of their influence on response, then, on 
any scale, interactions with that factor will be negligible. Similarly, if experi- 
mental units are fairly homogeneous, then one would expect that, for most 
scales of observations, the variability of units would be largely described by the 
unit errors, p», and the unit-treatment interactions would be negligible. (5) 
The “‘reparametrization”’ of the population of “‘true responses”’ to effects, interac- 
tions, and unit errors focuses our attention on summary properties of the ex- 
perimental situation. This has the advantages that (i) the analysis of variance 
mean squares are interpretable in terms of these parameters, which have a 
physical interpretation; (ii) knowledge of certain of the parameters is often 
essentially the information we desire from the experiment; (iii) by means of 
the decomposition given by the population model it is often simpler to appreciate 
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and evaluate assumptions which may be implicit or explicit in a particular pro- 
cedure or inference. 

We turn our attention now to the development of a formal usable statistical 
model, for the actual experimental observations, in terms of the parameters 
defined above. We recall that our experiment could be regarded as giving us a 
random (within the restrictions of the design) sample, the {2,jx«;}, of size 
D> i jens Nix joxe , from the set of ABCP populations represented by the random 
variables {Yjijcm}, Where Yijem = Yisem + €:jem. To write an explicit model 
for the j++, it is useful and convenient to employ certain “dummy” random 
variables which we now proceed to define. 

Let ai” = 1 if selected level i* of factor corresponds to level i in the popula- 

tion of levels of @; 


= 0 otherwise. 


Thus, if the 2nd selected level of factor @ corresponds to the 5th population level 
of factor @, then af = 1. 
Similarly we define the sets {87°} and lye}. 
Because of the specification of random selection these quantities are random 
variables some of w hose distributional properties are easily written down. For 
example; (1) The ie i: {83°}, {yi} are groupwise natetoahy independent; 
(2) Pr (aj = 1) = - (3) Plaj’aj: = 0) = 1,1 ¥ 7’; (4) Plaiay) = 1) = 
1/A(A — 1)), * 5 ibe i ~ i’; (5) P(g}, = 0) = (B — 1)/B; etc. We note 
that the a’s, 6’s, and y’s are associated with the random selection of the factor 
levels to be tested. 
We turn now to the specification of association of selected treatment com- 
binations with experimental units. To this end we define 
6.7"! = 1 if the fth replicate of selected treatment combination (i*j*k*) 
is tested on unit m of the population of experimental] units; 
= 0, otherwise. 
In view of the random selection of units for test and the randomization of treat- 
ment combinations to experimental! units, it follows that the {8.7*°’} are 
random variables with the following properties: (1) They are statistically inde- 
pendent of the a’s, 6’s, pags y’s defined above; (2) P(8. 7"! = 1) = 1/P; (3) 
P(8, Ee ee = iy) es (i*;*k *f) ox Orwan (4) PP er! a 
0) = 1, m¥m’; (5) Pees soiree ees" 1) = (1/P(P — 1), (ei*k*f) # 
(i~’7"'k*’f'), m # m’; ete. 
It is now simple to write an explicit model for the observations 2;+ j*,«; , as 
follows: 


Ziejoney = w+ Dai ai + Dy Bd; + » vi te + 2 a8} (ab),; 
‘ 3 *3 


+ » ai yi (ac)in + a Bi yk (bea + 2d ai "Bi vk (abe) ij, 


+ > at + » a; "Bi vk bn ee (Cian + €ijkm)- 
m 7 
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The correspondence of terms between this and what is given in Section 7 will 
be apparent on inspection. 

From the point of view of our development, the random variables in this 
model are the a’s, §’s, y’s, and 6’s, which take on the values 0 and 1 with probabili- 
ties specified by the experimental design and procedure, and the e’s. All other 
quantities are regarded as fixed, unknown parameters defined on the array of 
“true” responses { Y ijem}. 

This model, together with the properties of its components, contains all the 
implications of our procedures of random selection and allocation, as well as all 
assumptions we have made in the conceptual frame of reference for the analysis. 
It is in a sense “‘sufficient”’ for the general experimental situation and design, to- 
gether with the additional assumptions which we made explicit in Section 6. 
This model can therefore be employed quite formally in any statistical manipula- 
tion or evaluations of the experiment, without reference to any other features. 

The complexity of the model is only in its initial appearance. It is easy to handle 
in algebraic manipulations and, in particular, makes into an elementary alge- 
braic operation the evaluation of expectations of various functions of the ob- 
servations. 

Toward the end of this section we shall illustrate how the statistical model is 
employed in evaluation of expectations of analysis of variance mean squares. 
Before doing this we give the explicit definitions of the components of variation, 
which have appeared in the ems in previous sections, in terms of the components 
of the population model. These are as follows: 


“Peete a 2 
= 7h 7 = 2, bi; =o 3&4: 


“Fi 
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Qhep _ B—1\(C liP—1 _ 1) 4 2d (q. jkm — Q-j-m — q.-km) ; 


1 
(A — 1(B— 1D — 1I(P- Db 


2» (dijem — Qij-m — Qi-km + Gi--m — Q-jkm + q-j-m + q--km)'- 


ijkm 


Q abe p = 


With the exception of 3 the definition of these components is according to 
the scheme 


(sum of squares of quantities) 
(no. of quantities — no. of linear dependencies) ° 


While there is no doubt that the o2 , 03, etc., reflect the variability of the 
populations {a;}, {b;}, etc., some further justification for the method of choice 
of divisors is in order. An important (and perhaps sufficient) justification is that 
such a method of definition simplifies the appearance of the ems and the vari- 
ances of certain linear estimates. For further insight we might argue that the 
measure of dispersion wanted for, say, the {a,} is essentially that for the {Y;...}, 
a fundamental measure of the dispersion of which is one of Gini’s mean dif- 
ferences, namely, the average of squares of differences between pairs from the 
population. For the case of the pa this is 


G. = 7a —1 =a &y Fe 


2 ; 
= 20%. 


(The factor 2 arises because each pair, in inverted order, appears twice.) The 
same argument applies to 0} , 72 , and o . For the case of a measure of dispersion 
of, say, the {(ab),;}, we might argue that this should reflect the magnitude of 
interactions in the two-way array {Y;;..}, a fundamental measure of which is 
a mean square “double difference” 


1 
Ga = AB(A — 1)(B — 1) md (Ye.. — Yiy ) (Yi. om FF 


omy 


Now the quantity in square brackets is identical with 
(ab)i3 — (ab) — (ab)v5 + (ab)esx , 
and remembering that dD (ab);; = >>; (ab):; = 0, and hence that 
» (ab)i; = — » (ab) (ab). = — &, (ab):;(ab) 5 = p> (ab) is(ab) «5 , 


ji’ j ji’ 
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it is easy to find that 
Ga = 4o%, . 


(Again, the factor 4 arises because essentially the same quantity is permitted to 
appear four times.) The same argument applies to na Ge Qi» , Qs» , and Q,. 
and can be extended in the obvious way to case , Qaop , Qeep ‘ Qiep , and Qusep - 

The structure of the Q’ quantities is, from their definition and that of Giikm ; 
such that they reflect interactions of treatment factors with experimental units. 
For example . 


Q, = aE = Do (Yim — Yin — Yom + Y...)%, 


which reflects the interactions of levels of @ with experimental units. In view 
of the role which the unit treatment interaction components of variation play 
in the ems, it was felt that a distinctive notation for them would be worth while. 
So far as their formal definitions are concerned there are no distinctions between 
the o”’s (except a.) and the Q”’s 

The essential reasons for the definition of «3 which was used are that o; ap- 
pears in the expectation of the residual mean square and that such a definition 
shortens some of the formulae. It is easily checked that 


2_ (A — 1) (B (C — 1) a i—1)\(B-1),:; 
7? ——= Qep + — Dg, + Qe» + 4 Qap 


1-1(C-1) , (B-IDC-1) >: 
= 
(A — 18 — IC = 1) 
ABC 


We proceed now to show how to use the statistical model in deriving the ex- 
pectation of the @ mean square, A* = 1/(a — 1)A’, for the case of proportional 
numbers (Section 9). This will illustrate the basis for the results given in previous 
sections. 


We have 


Qarep « 


Dd. Nisjexe(Zis... — 2...) = rVW 3% Uje(Tje... — Besse) 


i*jeke 


The statistical mode] can now be substituted into this expression, and deter- 
mining the expectation becomes a purely algebraic operation when one uses 
freely the fact that the expectation of a sum is the sum of the expectations. Thus, 


1’ = rVW) ulate — a* + (ab)}, — (ab)*. + (ac)f-. — (ac)*. 
+ (abc)je.. — (abe)*.. + Die... a dz... op Gis... _ q.... hy. Mawes salen 


where 


= >. af'ai; P = a Ujs Are; 
i ad 
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1 i* je 
7 Uj- -(ab)* iej0 = V : Vj*+ Qa; 8} (ab) i; ; 


5% 


a> Uje Vie -(ab)*. jeje 5 

V ise 
7 1 t*j*kes . 
ruieVW jekes bm Pes 


l ay ke i*jekes 
a —— > i i Om Cijim ° 
ranVW ie, - 


tjkm 
ete. 


It is easy to check that unlike terms in the above expression are uncorrelated. 
For example, 


Blotpt.) = BOE ala) ( 3 ape) = (X aa IN Fe, me BE) 


since the a’s and 6’s are independent. But E(a;’) = 1/A for all i and 7*; 
E6L"*!) = 1/P 


BI 


for all m, i*, 7*, k*, and f, and > a; = > om Pm = O. Similarly, the expectation 
of all other cross-product terms may be shown to be zero. 

(In the event that, for example, @ is fixed, ie., A = a, one will in general 
not renumber the levels at random, so that in our notation 7* and 7 would be the 
same index in making the formal correspondence. As we mentioned elsewhere, 
for symmetric functions of the observations (in our sample involving all levels 
of @) no difficulty arises. In the section on linear estimation which involves non- 
symmetric functions we shall give an extended notation. For the present, if we 
use the convention that when A = * and 7 are the same index, then, for exam- 
ple, >-; aia; = a3, since a; = 1 with probability 1 and a; = 0, i #7 with 
probability 1, using our convention. Then aj. would become a; , a constant, and 
since E(p*-...) = 0, the above result and its analogues remain true.) 

Hence 


E(A’) = rVW Dow uweE {fate — atl + [(ab)?.. — (ab)*} + [(ac)?.. — (ac)*} 
+ [(abc)}-.. — (abe)*. + [pye... — p®..} 


+ (aie... —_ &:f + Sages. _ ei: 
Now, 


> u; Ela. — a*)? = E\2. uyeOle — Ss ak Uje Ay *)*} 


E{X mE ata Fp (we D a ‘a)*) 


| . 2 ° ‘or ; 
E)> ujai'a — 5 Ujeay a; — Dd. je ujer a ai G0; |, 

}a*; i* nie’ 

ti’ 
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where we have used the facts that 
P(aj"ai? 
Plait ai” 
(a;') 
Recalling now that 
E(ai’) = , : E(aj"ai: —— 
A A(A — 1) 
and iin aa; = — >>; a2, we obtain 
5 ae Uys Ujer 


a . 
( a) ae - oe x > wa te ) 
Gop a | 4 - — a9 ) yey U? — » vio | 


1 A ra | 
Uo? | 7 (A —1— AU* + U*+1 — U*) 


= U(1 — U*)o?. 


Hence the coefficient of 2 in the expectation of A* = 1/(a — 1)A’, the mean 
square for @, is 


» (1 — U*) 
rUV ———— 
(a — 1) 
as given in Table 1. Note that if all uj = 1, which would be the case if the total 
number of observations of each observed level 7* of @ is the same, then U = 
a, U* = 1/a, and the coefficient becomes rVW. 
As another example we consider 


2 
s* e. = 7 eee il ° a! 
» use Elpi Dp =E{du | ca ey! . ove Pe * }}. 


;* jeke 
where 6 denotes 8, ?°*"/ 


1 7 
PW EDs Use ( Fe, irm) a a (,%., bpm) } 


If the expressions in parentheses are expanded, it will be seen that a number of 
the terms will vauish because of relationships like 


P(aiePe sities — Q) = 1 m # m’; 


and 


Pate Pe iger’ Peers” ai (i*j*k*f) of (i*"7*"k*’f’) . 
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If we also use the facts that 


*7ekes, l 
E(@*"") = 
( ) P’ 
1 


E geeks Creer a " 
( ) P(P — 1)’ 


Lo pm = — Lo Pm Pm’, 


m xem!’ 


m £m’, (i*j*k*f) # (i* j*k*f), 


and if we use (j*k*f) to denote > j++") , etc., then the expectation we seek to 
evaluate is 


2 Pm Zaye 
rae rE (j*k*f) — rR 1D (j*k*f ¥ f’) 


1 1 


. P(P — 1) P(P — 1) 
1 


~ PIP — 1) (j* # jWk* # ei) | = oP | rien 2 
(GG*hS HF f) + (i4i* GRP’) 
+ (ittht ~ K*ff") + (ijt MRE Sf") 
+ (i* F iM 7*k*ff’) + (a* 4 iM TV K* ff’) 
+ (G* iM GtK* Rf!) + OG iit GRY K*Sf’)})}. 


It remains only to write down the various values of the sums and collect terms. 
Thus 


(j* # j*k*ff’) — (j*k* # k*’ff’) 


(j*k*f) = » (1) = ze Nijsjexe = TUye > Vje Wie = Tuye V W; 
jokes 


jtk* 


G*kyYef)= + = 2. Nisjoee(Niejeee — 1) 


Iek*f gf! 


ruie 2X 3 Vjs 2 Wie — rue VW 
rueV*V WW? — rueVW; 
(j* # ik) = de Dd (a) 


ptkes g* kes’ 
g* 5°" 


> Niejeke Nisjeks = r ure > UjeVje 2 Wie 


j* yt 5?" 5° 5?’ 
ke 


Pu(V? — VV*) Ww; 
Gi*j*k*f) = rUVW; 
(i*j*k*f x f’) _ rU’?UtV’V Www has rUVW; 
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(i*7* i k*ff’) = PUU(V? — VV*) Wows; 
etc. Thus the coefficient of os Pm is 


aR RE a fue VW(P — 2) = (Pd. V4 WW? = rneVW) 
=. 7 7 ie Uge 


—ru(V’ — VV*Ww* — rueV VW — Wew'*) 
runVWi — Vs) — Ww) 


7 hUvW(P — 1) — (FUU*VV *WeW* — rUVW) 


rUUTV 1 — V*)Wew* — PUeUtV Ve — W*) 
PUUW id — Ved — Wt) — PU — U*) eV 


° 


rua — Ura — VeyWwew* — rod — U*)VVwr — W*) 
rvUvwa — U*)a — V*)a — W*)y} 


— a a7 1D ((P—1) + 1 — ru.VWI{V'W’ + (1 — VW" 
+VQ-W)+0-V)1 — W’)}] 
— ((P -—1)+1-—rTuvwWiUVwW +00 - Vw 4+ UV — 2”) 
+U-Voa-w)+0-UoVw+0-Uju-V yw 
+0 -U)Vi-w)+a-U)a - VW) — W)h)} 
70 PPA sp wr I’ 
Thus the coefficient of o; in the expectation of A* is 


(a—1) 1 


we AW Go 


i; 
as given in Table 1. 

In a similar way one can complete the evaluation of £(A’), proceeding from 
component to component; and of course the other mean squares may be handled 
in the same fashion. In view of the symmetry of factors one can write down, at 
once, E(B*) and E(C*) from the results for E(A*), and likewise for 4s, [Sc , 
and I[%<- . A check on results is that the expectation of the total sum of squares 
should equal the sum of the expectations. 

The complexity of the formulae and also of the algebra is considerably simpli- 
fied when the number of observations per cell is a constant, say r. 

While the operations with the statistical model may appear tedious, this is 
more apparent than real, in that with some familiarity with the technique a good 
deal of the writing can be decreased through short-cut notation and simplifica- 
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tion by inspection. Furthermore the operations are quite elementary and me- 

chanical; and in addition to the symmetries we have mentioned, there are others, 
: 2 2: , 

such as the symmetry with respect to oa and og, in E(A*). 


13. A more symmetric form for ems; extension of results. The general for- 
mulae for ems may be put in a more symmetric form which is simpler in ap- 
pearance and which makes very simple the extension of the results to four or 
more factors. The modified form of the results involves certain linear combina- 
tion of the defined components of variation in terms of which the expectations of 
mean squares have the appearance corresponding to an “all factors random, 
number of units infinite’’ situation. This general pattern for ems, involving ap- 
propriate and definite rules for forming the linear combinations of components 
of variation, has been obtained by one or both of the present authors for more 
complex designs and situations than we have studied in this paper; ramifications 
will be discussed in later communications. 

We shall consider, for definiteness, the results of Table 1 on ems for the case of 
proportional numbers. These results are given in Table 5 in terms of the following 
notation: 


2 1 1 2 
RB’ _— a One _ pO + Cade + ap Qin» + a ap Oe ve BCP Quen . 


>, and &, are defined analogously. 


: 2 1 ¢ 
cb = Te — 7 Tobe — p Var» + 


( Qasep . 


ip 


D.- and &,, are defined analogously. 
Cabe ~ P Qabep . 


1 


= = l ns 1 2 
9 A Qa — R ee ee 


: 2 - l 1 
Lap = Qop — B Var — a Qhep + BO Qice . 


x, and Y,, are defined analogously. 
+2 
Zabp = Qo» _ C Qabep . 
and >,-, are defined analogously. 
9 
Labcp = Gacy . 
Zo =o + ZLabcp +2 mbep +z “acp +2 mabp +z “cp +z bp +2 “ap + Zp 

2 2 2 

=o +o, + oy. 
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TABLE 5 


Symmetric form for the results of Table 1 


Mean 


Expected mean squares 
squares 


] ~ oy 
(a — 1) 
TY, ~( _ *) * kw i Tks TxlV«w 
rUVW (i iD (> + U*2a + W*2,. + U*W*2a-) + 2 
= 
(1 — W*) 
(ec — 1) 
(1 — U*)Q — V*) 
= —— (2%, W*Za5-) + Zo 
(a — 1)(b — 1) ot we) 

a — U*) W*) 
(a — l)(e 1) 
(a — V*) — W*) 
o-ie-1) 


At ryw § (3, + V*Sa + W*Eee + V*W*Z a.) +2 


| rUVW (5. + U*Zae + V*Z. + U*V*Zae) + 2 





rUVW 


| rUVW (Sac + V*Zae) + 2 


rUVW Se + U*Z ae) + Zo 
(1 — U*)\(l — V*)( — W*) S 


* rr? . 
co r-UVW ns Labe + Xp 
Dar ’ (a — 1)(b — 1)(e — 1) ~ + 


= x 

An inverse relationship giving the o° and Q, quantities explicitly in terms of 
the 2’s is easily written down. 

The form of the results given in Table 5 not only makes entirely clear the pat- 
tern for extension to more than three factors but also indicates what are, in general, 
the estimable quantities in the analysis of variance. It will be evident that an 
unbiased estimate, based on the analysis of variance mean squares, always exists 
for each >> quantity in Table 5. It is of interest that the 2 quantities depend 
only on the population sizes and not on the sample sizes. 

To make explicit the pattern of extensions to more than three factors, we give 
E(I%s) when we have four factors @, ®, ©, D. The notation and definitions 
implicit should be clear. We use X as analogous to U, V, W, and X* as analogous 
to U*, V*, W*, with definitions of components of variation as before. Then 
B(I%s) = % + ruvwx “.— UG — V") 

(a — 1)(e — 1) 
(W*X*Zanca + W*2are + X*Zasa + Fas), 
where 


2 L os 
Lered = Qarea — P Qircts ; 
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os Dw 
P Qabep + DP Qabedp ’ 


1 
-% 


= 
Qrep + CD Cabed 


1 2 1 2 

abe 5 Vabdp — APP Vadedp » 

+ ape , + pp? . cpp & . 
> ~ > 2 

Zo = Zp + Zap + Lop + ‘++ + Zabveap to, 
etc. 

Further discussion on the extension of results for expected mean squares to 
other designs, on a more formal (operational) statement of definition of the = 
quantities, and on the formal general reciprocal definition of the o and Q° 
quantities in terms of the ’s is deferred to a later publication. 


14. Estimation of effects, interactions and errors. In many factorial ex- 
periments one of the objectives of the experiment will be the estimation of con- 
trasts, such as >.;k,a;, with >>; k; = 0, and in particular differences such as 
a; — a; ; also the uncertainty (usually as measured by variance) of such esti- 

mates needs to be estimated. The essential objective of this section is to illustrate 
briefly the use of the statistical model in such “linear estimation” problems. 

To simplify the exposition we shall deal with the case of two factors, which 
is equivalent, formally, to putting C = c = 1 for the situation we developed 
earlier. We can now drop the subscripts k and k* and all interactions involving 
e€. The population model becomes 


iim = w+ a: + bs + (ab)s5 + Pm + Gsm + sim 


The statistical model becomes 


renin = w+ Qe avs'as + Dy Bib; + DL ai'BF (ad)is + DL bm’ Dm 
+ ie otk BF 8°"! (des + €ijm)- 


ijm 
We recall that our experiment involved the random selection of a levels from A 
of factor @ and b levels from B of @, where a S A, b S B, and the random 
allocation of the ab selected treatment combinations to randomly selected ex- 
perimental units (from a population of size P), so that each selected treatment 
(i*j*) appeared nj; times, nj = 1. 

For the case of A > a, B > b the association of n,j* values with population 
treatment combinations (77) is a random one. For the case of @ and @ fixed 
factors one of two situations might exist, namely when 7* and 7 (and j* and 7) 
are taken as the same index or when the range of 7*(j*) is a random permutation 
of the range of 7(j). In the first case we can speak of having n;; observations for 
treatment combination ij; in the second case we have (>>, a; 8) nj) observa- 
tions, a random variable having average value 1/ab > .«j« nj» , for treatment 
(ij). To bypass this difficulty, we shall consider in this paper the case of equal 
numbers, i.e., nj; = r = 1, all 7*7* 
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Under this last condition, 
Ze = p+ ais + b* + (ab)%.. + pee... + 


where 


(ab)%. = 7D ai'8i (ab); 


ete. 
With no further restrictions an a : 


when the right-hand side is determinate. This quantity will be indeterminate 
whenever population level 7 of @ is not included among those actually selected, 
for then both numerator and denominator above will be zero. Then, when 
x’ exists, the denominator above is 1 and 


“= ypta’' + z a’ [b* + (ab)*.. + Dre... + que. + €:..] 


=ptat : 2. or, + : >, BF (ab)iy + D> ai’ [p?... +97 + €%..]. 
) 7*3 ) 3*3 i* 


It should be noted that this statistical model for z*’” is conditional on level i 
of @, having been one of the selected a levels of @; hence, in this expression, we 
take P(a; = 1) = 1/a, which is the conditional probability that selected 
level 7* corresponds to population level 7, given that 7 is selected. 

In the last expression, all terms after the first two on the right-hand side have 
expectation zero, whatever the relation of B to b. For example 


ELXBF (ab). = = nh (ab,;] = 0; 
E (Y a; pi...) —— ay (20 pm) 


= ap. = 


F(X ai qi..] Si 2. 7. 2 a a dim | 


i*f ijm 


a ~ a8" | 


TO i*j*f jm 


>> Zz (2 qiim) = = 0; 


3 


— 
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Thus z*”’ is an unbiased estimate of 
uta; = Yy.., 


which is the conceptual over-all mean “true” response from all (population 
treatment combinations involving level ¢ of @ on all (population) experimental! 
units. Hence, an unbiased estimate of the difference of the main effects of levels 
i and 7’ of @, a; — ay, is given by (2° — x‘ ’’), when both of these quantities 
are determinate, independent of whether either @ or @ is fixed or not, and in- 
dependent of whether interactions of factors with each other or with units are 
negligible or not. 

It may be appropriate here to emphasize that the difference (a; — a,-) is 
independent of the other levels of @ under study but is very much dependent, in 
general, on what population of levels of ® and of experimental units is under 
consideration. (Note that the preceding sentence refers to population param- 
eters and not to sample estimates.) 

In considering uncertainties involved in the estimation of (a; — a,) by 
(x' — x") it is clear from the model for 2°’ that the estimate will be af- 
fected by the interactions of levels i and 7’ of @ with levels of @ only if @ is not 
fixed, for if ® is a fixed factor, then ri 83" (ab),;; = > (ab);; = 0, independ- 
ent of 7. On the other hand if @ is not a fixed factor, then the term 


‘> 87 [(ab),; — (ab) ,-;) 


does not vanish from (z°"° — az’). 
If factor ® is fixed and, further, unit treatment interactions are negligible, 
e., all gijm = 0, then 


i’ 


of =% =a;-—a; + ye (a; — air) (pye.. + ee..). 


The variance of this estimate is 


E[> (a; . ai) (pie.. bez.) = + Ey, (ai* ae ai (3 2d s"’pa) | 
= + E(( > afnh+4 SY alah! Pm Pm’ 


rb itjefm iej*f sf! 
mm 


+ i at 65 583"! Dm) + (similar terms with 7’ for i) 
eI°sf P 
sey? ess’ 
m gem’ 


~ 2( > > « £ ale gic Pl gs i Sm Den’ 
septs?’ Je77’ 


mem’ 


+ 2 2d a ati? bm 8m” Dm Dw’) | 


t* ei?’ 
g* gag?” ages 


I(? mt r(r — 1)b _ rb(b — y 
re|\P P(P—1) P(P —1) 
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+ (pga + Resale 


P(P— 1) PP 
») , 
== > (0° + a3). 


Hence under the conditions that (i) unit-treatment interactions are zero and 
(ii) @ is a fixed factor (i.e., B = b), the variance of the estimate of the difference 
of the main effects of two levels of @ is estimated unbiasedly by 2R*/rb, where 
R*, is the ‘“‘residual mean square”’ in the analysis of variance. 

We consider next the variance of the estimate (r""’ — zx‘ ’’) without the above 
restrictions. Then 


var (2° — 2") = = (o* + 0) 


+2[15 ZZ 83" ((ab)is — (ab)i3) + 2. (ai* — aia. | 


2 (+ os) + 5B BAX BF ((ab)y — (ab) iI 


+ 20 By BF a — (ab),)[(ab) sj — (ab) <-;-]} 


“2 si* 
ji’ 


1 _ *j*f 
+ rb? BLL ai * 8; é,” "Osim — X a fe 875 m Gitim| 


2 w+ » 4 1f0 Y [(ab)3; + (ab)?,; — 2(ab),;(ab),+5] 
x o op BIBS ad) ¥3 D) inj 2(ab) j;(ab) «5 


bb — 1) 
— BB 1% Mabdis + (adyi-5 — 2ab).s(ab)srs 


1 rb r(r — 1)b “a rb(b — 1) 
" aL BP PUP — Behe |S 2 dim P(P — 1)B(é — 1) By ee 


PB” P(P—1)B 


ym 


rb r(r — 1)b ] 2 rb(b — 1) Be 
+ i nears x. Qi'im P(P — 1)B(B — 1) <= Qi’ imi’ ji’ m 


2r’b(b — 1 
Zz diim qi’ jm iu... = iim wvsrn} 


° Pe pL + BP — 1)BB — 1) &. 


1 (B- 


5 BB 1) = [(ab):; — (ab) ,}* 


2 73 
at ied op) += 


1 1 


+ 3% BP(P — 1) 


[(P aa r) i (iim + qi" jm) + 2r > Qiim i jm| 
ym jm 
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ieee . 
bP(P — 1)B(B — 1) yu aie. m + Qi’ jm Vi’ j’m — 2ism Qi? j*m) 


ini” 


(B — b) 1 2 
= +5) +- $B. B-—1) 7 [(ab);; — (ab) 5] 


-1l)® ; (qiim - qi tim) — ; PP 1 py (Qaim = Qi?im) 
(b — 1) 
~ bP(P — 1 a 1)B(B — 1) ree 1) A (Qiim — Git jm) (Qijrm | Qi? j*m)« 


Before considering the estimation of this variance we shall obtain a useful 
related quantity, namely, the average variance of estimates such as (z°" — z* ’) 


aa =H 2 ik Meer 


on = (e+ oy + 0%) +7 (2 
9 2 
b [E(A*) — rboal. 


This displays explicitly (what is after all obvious) the relationship of the average 
variance of estimates of differences and the analysis of variance ems. Clearly this 
average variance can be estimated unbiasedly only if conditions are such than 
an “unbiased error term” exists for 72 , which will be true only when unit treat- 
ment interactions are negligible or/and the population of experimental units is 
very large. In general, the estimate of this average variance based on the error 
term for og given in Table 2 will be positively biased, i.e., the variance will tend 
to be overestimated. With regard to the component o% we note that its impor- 
tarice in this formula for the average variance of estimates of differences of main 
effects of levels of & is determined by the relationship of B (the population size 
of levels of factor ® to b (the sample size for levels of @) and not by any con- 
siderations concerning A and a. 

If the average variance given above was felt not to be adequate as an estimate 
of the variance of a specific difference (2‘’* — z*°’) one could carry out what 
would amount to an analysis of variance involving only the observations of 
relevance, namely, those that go into (2 — x" *’). Thus if we extend our pre- 
vious notation to 


a? 

. + Gy Diese i7e. 1 . 

x f. & Sees . and x a = Ee 
a* eo < 


ie Q; 


then the sums of squares of this “‘partial” analysis of variance would be 


Rigen ies r ije- i" je. - [os 
= Ysa 42°" — oh — 2"); 
“= j* 





M. B. WILK AND O. KEMPTHORNE 


> a —2"" — 2 42° PD Ie — c+ Ge" — 2”). 
i* a 


i.e., sums of squares for level 7 versus level 2’ of @; sum of squares for levels of 
@® averaged over levels 7 and 7’ of @; sums of squares for interactions of levels of 
® with levels 7 and 7’ of @; and residual. Clearly this partial analysis of variance 
will bear the same relation to the variance of (2""’ — 2” ’’) as the complete analy- 
sis bears to the average variance of differences. 

When interactions with experimental units are negligible, the residual mean 
square from the partial analysis of variance will have the same expectation as 
that for the complete one. 

When B > b and the interactions of levels 7 and 2’ of @ with levels of ® may 
be considerably different from the interactions of other levels of @ with levels of 
®, it may be worth while carrying out the partial analysis of variance to obtain 
estimates of the variance of the specific difference. 

The preceding discussion can be applied symmetrically to factor @ and be 
extended to a three or more factor situation. Similarly the statistical model can 
be employed formally to answer questions involving the estimation of specific 
interactions, or differences of such, and to find variances of such estimates. 

So far as experimental unit variability is concerned, randomization is fully 
effective in providing unbiased linear estimates and in giving unbiased estimates 
of the component of variation corresponding to the additive unit errors; but, in 
general, randomization does not lead to the unbiased estimation of the con- 
tribution to variances of linear estimates due to the interactive unit error. It is, 
however, probably true that in many situations this latter bias will not be im- 
portant. 
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ON A MEASURE OF THE INFORMATION PROVIDED BY 
AN EXPERIMENT! ? 


By D. V. LinDLey 
University of Cambridge and University of Chicago 


1. Summary. A measure is introduced of the information provided by an 
experiment. The measure is derived from the work of Shannon [10] and involves 
the knowledge prior to performing the experiment, expressed through a prior 
probability distribution over the parameter space. The measure is used to 
compare some pairs of experiments without reference to prior distributions; 
this method of comparison is contrasted with the methods discussed by Black- 
well. Finally, the measure is applied to provide a solution to some problems 
of experimental design, where the object of experimentation is not to reach 
decisions but rather to gain knowledge about the world. 


2. Introduction. Shannon has introduced two important ideas into the theory 
of information in communications engineering. The first idea is that informa- 
tion is a statistical concept. The statistical frequency distribution of the sym- 
bols that make up a message must be considered before the notion can be 
discussed adequately. The second idea springs from the first and implies that 
on the basis of the frequency distribution, there is an essentially unique func- 
tion of the distribution which measures the amount of the information. It is the 
purpose of the present paper to apply these two ideas to statistical theory by 
discussing the notion of information in an experiment, rather than in a mes- 
sage. The second of Shannon’s ideas has been applied to statistical theory by 
Kullback and Leibler [6], [7], [8]; but our application is quite distinct from 
theirs. The interpretation of Shannon’s ideas in current statistical theory has 
been given by McMillan [9]. The discussion in that paper is related to, and 
partly inspired, that given here. A referee has kindly pointed out that Shan- 
non’s theory has been applied in psychometric problems by L. J. Cronbach in 
an unpublished report [14]. Definition 2, in particular, is used by Cronbach. 

The situation in communications engineering is that there is a transmitted 
message, x, which is received as a message, y. By considerations of the informa- 
tions in x and y it is possible to discuss the rate at which information has been 
transmitted along the channel. The analogous description in statistical theory 
is provided by replacing x by the knowledge of the state of nature, usually ex- 
pressed by the knowledge of a finite number of parameters, prior to an experi- 
ment, and by replacing y by the knowledge after the experiment. The com- 
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parison of the knowledge before and after the experiment makes it possible to 
discuss the amount of information provided by the experiment. The average 
of this, for fixed prior knowledge, determines the average amount of informa- 
tion. The measure of information is given by Shannon’s function. But, just as 
it is essential to consider the statistical character of the message 2, so it is neces- 
sary to consider the statistical character of the knowledge of the state of na- 
ture. Prior probability distributions are therefore basic to the study. It seems 
obvious to the author that prior distributions, though usually anathema to 
the statistician, are essential to the notion of experimental information. To take 
an extreme case, if the prior distribution is concentrated on a single parameter 
value, that is, if the state of nature is known, then no experiment can be in- 
formative. 


It may happen that, whatever the prior knowledge, one experiment is more 
informative than another. We shall meet such examples below. In this case it 
is possible to compare the two experiments absolutely, without reference to 
prior knowledge. Methods of comparing experiments have been suggested by 
Bohnenblust, Shapley, and Sherman (described by Blackwell in [2]) and by 
Blackwell [2]. These methods of comparison are contrasted with the one pre- 
sented here, and it is shown that if one experiment is more informative than 
another by Blackwell’s criterion, then it is also true of that used here; the 
converse is false. 


The Bohnenblust method of comparison is formulated in decision theory 
language and involves considerations of losses. These notions are not used here; 
the concepts used are perhaps more related to the inference problem than to 
the decision problem (see Barnard [1]). In this paper it is suggested that al- 
though indisputably one purpose of experimentation is to reach decisions, 
another purpose is to gain knowledge about the state of nature (that is, about 
the parameters) without having specific actions in mind. The knowledge is 
measured by the amount of information, as described above. The following 
rule of experimentation is therefore suggested: perform that experiment for 
which the expected gain in information is the greatest, and continue experi- 
mentation until a preassigned amount of information has been attained. The 
consequences of this rule are explored and shown, for example, to lead to se- 
quential probability ratio tests. Binomial and normal sampling are also con- 
sidered as special cases. 


3. The experiment will result in an observation, z, belonging to a space, X. 
The space X has a o-field, ®, of subsets, X. For every 6 belonging to a space 
© is defined a probability measure on ®. We shall suppose that as @ ranges 
through © the probability measures on @ are all absolutely continuous with 
respect to a fixed measure on ®. This permits us to describe each probability 
measure by a probability density function p(x | @), such that the probability 
measure of a subset X is given by fx p(x | 6) dz, where, for simplicity of nota- 
tion, we have denoted integration with respect to the dominating measure by 
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dx. The ordered quadruple’ & = {X, ®@, 0, P}, where P is the set of p(z | @), 
characterizes an experiment, &. Again, for simplicity in notation, we shall not 
distinguish between random variables and the values assumed by them, nor 
shall we attempt to be specific in describing the density functions. Thus, p(x) 
will denote the density function of the random variable z; similarly, p(@) will 
denote the density function of 6, without any suggestion that the random 
variables z and 6 have the same density. These devices avoid such clumsy nota- 
tion as p(y) for the density of the random variable z when x assumes the nu- 
merical value y. 

We shall suppose that © is endowed with a o-field of subsets; usually, @ will 
be a subset of n-dimensional Euclidean space and the o-field will be the Borel 
field. A prior distribution for @ will be a probability measure on this field, and 
again we shall suppose it to be described by a probability density function p(6) 
with respect to a measure denoted by dé. Thus, in accord with the notational 
conventions described above, we have, for example, 


(1) ofa) = [ p(x | 6)p(0) aa, 


and Bayes’ theorem reads 
(2) p(o |x) = p(x | 6)p(@)/p(z). 


The ranges of integration in the following formulas will always be the whole 
space, either X or 9, and will be omitted. 

For a prior distribution p(@), the amount of information with respect to dé 
is defined to be 


(3) $0 = / p(8) log p(6) da 


whenever the integral exists. For any 6 for which p(@) = 0, define p(@) log p(é@) 
to be zero. A useful alternative notation is 


(4) $o = Ey log p(§), 


where E, denotes the expectation operator with respect to @. 

The reasons for the introduction of this function have been given by Shannon. 
Translated into the language of experimentation, the basic reason is this: Con- 
sider the case where 0 is finite; then the amount of information, J, in a prior 
distribution can be measured by how much information it is necessary to pro- 
vide before the value of @ is known. This latter information could be provided 
in two stages. For the first, let 6, be a non-empty proper subset of 6 with P = 
Se, p(@) dé ¥ 0 or 1, and suppose the experimenter is told whether @ ¢ 0, or its 
complement. This provides amount J,, say; the prior distribution being 
(P, 1 — P). In the second stage, suppose the experimenter is told the value of 


3 Strictly, the quadruple should be a quintuple and should include the dominating 
measure; for convenience, it will be omitted 
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6; the information provided is I, or I; , say, according as he knew @ ¢€ @, or its 
complement. (The necessary distributions are p(@)/P and p(@)/(1 — P), re- 
spectively.) Then Shannon requires that the information provided in the first 
stage and the average amount provided in the second stage add up to the total 
information; that is, 


I = I; +. PI, + (1 —_ PYI;. 


This additivity requirement is the fundamental postulate. It finds its general 
form in Theorem 2, below. Shannon then shows ({10], Appendix 2) that J = 
>. p(6) log p(@), apart from an arbitary multiplying constant, is the only func- 
tion having this property together with a mild continuity property. 

We note that the amount of information, so defined, is not invariant under 
a change of description of the parameter space. This lack of invariance need 
cause no concern, as it will disappear when the expression is used to define the 
average information in the experiment. The minus sign introduced by Shannon 
in front of the integral is not used. The reason for this is as follows: the maxi- 
mum information, in a statistician’s sense, will be obtained when the prob- 
ability distribution is concentrated on a single value of 6, and the information 
will be reduced as the distribution of 6 ‘‘spreads”’; this is exactly the reverse of 
the situation faced by a communications engineer, where the concentration on 
a single value would allow no choice in his messages. The two scales are there- 
fore reversed. 

After the experiment has been performed and the value z observed, the 


posterior distribution of @ is p(@|z), given by (2), and the amount of informa- 
tion is 


(5) $,(z) = [ p(@| x) log p(@ | x) dé. 


(If p(@| x) = 0, define the integrand to be zero.) 
DerimitTion 1. The amount of information provided by the experiment &, 
with prior knowledge p(@), when the observation is z, is 


(6) (6, p(6), z) - $1(z) — So. 


This expression is also not invariant under a change of description of the param- 
eter space. 

The quantity 9(&, p(@), z) depends on z; some results are more informative 
than others. However, since 6 is regarded as a random variable, this quantity 
may be averaged with respect to x according to the probability density given 
by (1). Hence, we have 

DertniTion 2. The average amount of information provided by the experi- 
ment &, with prior knowledge p(@), is 


(7) 9(&, p(@)) = E,f{gi(x) — Jol. 
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Alternative forms for 9(&, p(@)) are 
(8) EE, log {p(@ | x)/p(@)} (from (3) and (5)), 
(9) EE, log {p(x | 0)/p(x)} (from (2)), 


and, in full, if p(x, @) is the joint density for z and 8, 
(10) [f p(x, 0 tox (p(, 6)/p(2)p(@)} ax as. 


The expression (10) shows the symmetry between z and @ and also exhibits the 
fact that 9(&, p(@)) is invariant under a 1 — 1 transformation of the parameter 
space, 8. The expression occurs in Shannon’s theory ({10], Section 24) for the 
rate of transmission of information along a channel.’ 

Yet another expression for 9(&, p(@)) which is useful in calculation is obtained 
by introducing the information operator, J, along with the expectation operator, 
E. For a density function p(y), we define 


I, p(y) = [ ew log p(y) dy. 
It is easy to verify that 
(11) 9(&, p(0)) = Epl-p(x\ 0) — I,Egp(x | 8). 


4. The results that we now proceed to establish involve only the use of Bayes’ 
theorem and the two facts that the logarithm of a product is the sum of the two 
logarithms (in the combination of equations (12) and (13) for example) and 
that the function z log z is convex (in Theorem 1). We shall often denote the 
average information by 9(&) when the particular prior distribution does not 
have to be stressed. 

THEOREM 1. 9(&) = 0, with equality if, and only if, p(x | 6) does not depend on 
0, except possibly in a null set for 0. 

This follows immediately from a well-known inequality (see, for example, 
Hardy, Littlewood, and Pélya [5], Theorem 205) on writing 


g(&) = I f(x, 0) log f(x, 6)-p(x)p(@) dx dé, 
where 
f(x, 0) = p(x, 6)/p(x)p(9). 
The inequality says that 
| [| se, OpCerp@ ae a 
g(&) = I f(x, 0)p(x)p(6) dx dé-log . ———————————_ 
| p(x)p(@) dx dé 

‘In the particular case of the ‘“‘experiment’’ involved in radar work, the above ideas 


are already contained in a paper by P. M. Woodward [12], and are repeated in [13]. The 
author is indebted to M. S. Bartlett for these references. 
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with equality if, and only if, f(z, @) equals a constant, except possibly on a 
null set. The logarithm is zero. 

The theorem says that, provided the density of x varies with 6, any experi- 
ment is informative, on the average. Note that 9(&, p(@), x) is not necessarily 
nonnegative. Although the expectation is positive, the experimental result may 
reduce the amount of information. This can happen when a “surprising’’ value 
of x occurs; granted the correctness of the experimental technique, the “‘sur- 
prise’ may result in our being less sure about @ than before the experiment. 

Suppose that the observations x in an experiment & consist of a pair of ob- 
servations 2, x. That is, every x ¢X is an ordered pair (x, 22) with 
x; ¢ X; (i = 1, 2). Let @; be the o-field over X; induced from @ by the trans- 
formation xz; = 2z,(x), and let P; be the set of probability densities p(z; | 6) of 
the observations z;(¢ = 1, 2). (It is again supposed that the measures are, for 
all 6, dominated by a measure so that the probability distributions can be char- 
acterized by densities.) Then, &; = {X;, ®;, 6, Pi} (¢ = 1, 2) are two experi- 
ments and & is said to be the sum of the experiments &, and & , written & = 
(& , &). We shall also have to consider the experiment &(7,) = {X., ®, 9, 
P,(x,)}, where P2(a2) is the set of densities p(x | 0, 2). 

Consider 9(&2(21), p(@ | 21)). Since p(@| 2) is the posterior distribution of 6 
after x, has been observed, this quantity is the average information provided 
by an observation on 22 after & has been performed and 2x, observed. The aver- 
age of it over 2, is defined to be the average information provided by & after 
& has been performed. We write it 9(& | &:), again supressing p(6). A proof along 
the lines of that for Theorem 1 establishes that 9(& | &:) 2 0, with equality if, 
and only if, p(xz | @, 2,) does not involve @, except possibly on a null set. 

THEOREM 2. 9(&:) + 9(& | &:) = 9(&). 

We have, using the form (9), 


9(€&:) = E.,Es log {p(a | 6)/p(x1)} 
= E,,E.,Es log {p(x | @)/p(x:)}. 


(12) 


Also, from the definitions immediately before the statement of the theorem, 
(13) 9(& | &1) = Ez,[9(E2(21), p(O | %))) 
= E,,E,,Es log {p(x | 6, x1)/p(x2 | x1)}. 


Addition of (12) and (13) gives 


( | | 1 
E:, E., Es log ) p(s ) ®, 1) P(r 0) | ha E., E., Es log {Peet = »\ ’ 


:: p(x2 | 21)p(x1) —s 


which is 9(&), and the theorem is proved. 

Coro.uary. If x; is sufficient for x in the Neyman-Fisher sense, then $(&) = 
9(8). 

For if x; is sufficient for x, the factorization theorem shows that p(z | 6, 71) 


p(xr ’ 2) } 
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does not involve 6. Hence, by the remark immediately before the statement of 
the theorem, 9(& | &,) = 0, and the corollary is established. 

The corollary establishes that there is no loss in information if attention is 
confined to observation on a sufficient statistic. Conversely, if a statistic is con- 
sidered which is not sufficient (in the sense that it does not satisfy the factori- 
zation theorem), then information will be lost since 9(& | &) > 0. Theorem 2 
generalizes to a finite number of experime its with common 9 in an obvious 
manner. 

DerFINITION 3. Two experiments, & and &, with 6; = @ = 9, are inde- 
pendent if p(x , 22 | 6) = p(x | 6)p(xe | @) for all 6 ¢€ 0. 

Of course it by no means follows that if & and & are independent, then 2; 
and z2 are independent; i.e., it is not usually true that p(x , 22) = p(x1)p(2-). 

If & and & are independent, the experiments &2(27;) and & , defined above, 
are equivalent (in the sense that the four pairs of defining elements are all 
equal when we write & = &), and we have the result 


(14) 9(& | &) = E,,9(&2, p(@ | 21)). 
THEOREM 3. If & and & are independent 
I(&2 | &:) S 9(&2), 
with equality if, and only if, x; and x2 are independent. 
From (13) and the independence, we have 
9(&) — 9(&2 | &) = E,,Es log {p(x | 6)/p(x2)} 
- E,,E,,E¢ log | p(X2 | 6) /p(2xe 
= E,,E,,E¢ log | p(2 | 21)/p(x2)} 
= E,,E,, log |p(22 | 11)/p(x2)}. 
The last expression is identical with (9) when 22 , 2; are replaced by z, 6, respec- 
tively. By Theorem 1 it is therefore nonnegative, and is zero if, and only if, 
P(x2 | 1) = p(2). 

Again, the definition and theorem could be generalized to any finite number 
of independent experiments. The theorem says that if & and & are independent 
experiments, either one is more informative, on the average, if performed first 
than if performed second. In particular, if &; = &, the theorem says that an 
independent repeat of the same experiment is less informative, on the average, 
than the original experiment. This is a property which agrees with the common 
belief in the diminishing marginal utility of independent equidistributed ob- 
servations. 

Corouuary. Jf & and & are independent experiments, then 


9(&) + 9(&) = 9(8), 


with equality if, and only if, x; and 2» are independent. 





MEASURE OF INFORMATION 


9(&:) + 9(&) = 9(&,) + 9(Es | &) (by the theorem) 
= 9(8) (by Theorem 2). 


The corollary is not necessarily true for experiments which are not inde- 
pendent. It is easy to construct an example where &, or & separately provide 
no information, but jointly they are completely informative in the sense that 
the posterior distribution is necessarily concentrated on a single value of @. 

In the case of repetition of identical experiments, more than the result of 
Theorem 3 can be said about the reduction of information on repetition. Let 
&' = & be any experiment and let & , & ;--- be independent identical ex- 
periments. Let &°’ = (&, &) and generally e's 6.8" -): at ee) = 
jn ; the prior distribution can remain unspecified. 

THEOREM 4. j, 7s @ concave, increasing function of n. 


It will be enough to establish that 
O S Jnut — Jn S Jn — Jnr- 
The first inequality follows from Theorem 2, for by that theorem 
jnui — jn = GEnui |E”") =O. 
The second reads: 
$(En41|8°") S 9(&|8°""”). 


Since &, = &n41, it will be enough to show that 


(n 


iti0(t"".2) 6 aiuit’”). 


This follows as a slight generalization of Theorem 3, saying that the additional 
experiment &, reduces the average information provided by &,4:, even after 
gr ; 

Consider the following experiment: With probability A (for all values of 6), 
perform experiment & ; with probability 1 — A (for all values of @), perform 
& , where & and & have 0, = @ = 0. The observation will consist in the ob- 
servation obtained, according to whichever experiment is performed, and the 
knowledge of which experiment was performed. Denote this experiment by 
(AS; + (1 — A)&). In mathematical terms (A& + (1 — A)&) = 
{X = X,u X., B = Gu &, 9, P}, where P is the set of densities, p(x | 4), 
defined as follows: If x eX: , then p(x | 6) = Ap(x | 6) with xz = x2, ;ifreX, 
then p(x | 6) = (1 — A)p(x. | 6) with x = z,. It is easy to verify that 


(15) S(XEr + (1 — AEs) = AS(Er1) + (1 — A)S(E2). 
In this terminology the concavity property established in Theorem 4 says 


that 


(m 


60a" + .+rve™) < ae"), 





994 D. V. LINDLEY 


with n = Ak + (1 — A)m. The last equality ensures that the average ‘‘sample 
sizes’ are the same, and the inequality says that rather than ‘‘mixing” two 
sample sizes, it is better to take a sample of fixed “size’’ equal to the average 
size of the mixture. We discuss the result again below. 

TuHeEoREM 5. For fixed &, 9(&, p(@)) ts a concave function of p(@). 

We have to show that if p:(@) and p.(@) are two prior probability densities 
and 0 S \ S 1, then 

9(&, ApilO) + (1 — A)po(O)) — AI(E, pi(A)) — (1 — A)S(E, p2(6)) 2 O. 

The left-hand side is 


[ff pe | eps) + 1 = »)pr0) tog {p(x | €)/pl2)) dx do 
— I p(x | 6)pi(@) log {p(x | 6)/pi(x)} dx dé 


— (1 —\) I p(x | 8)po() log {p(x | @)/po(x)} dx dé, 


where p(x) = Jp(x | 0)p;(6) dé (i = 1, 2) and p(x) = Ap,(x) + (1 — A)p2(z). 
This simplifies to give 


r I p(x | 6)p.(0) log {p:(z)/p(2)} de do 


+ (1 =») [ff pe | Op.(0) log {p.(x)/pl2)} dx ao. 


Performing the integrations with respect to 6, we have 


» | palx) log {pi(z)/p(e)} de + (1 — 2) f px(z) log {ps(z)/p(a)} dz, 


and these integrals are positive by the inequality used to establish Theorem 1. 
TueoreM 6. Let &; = {X, @, 6, P;} (¢ = 1, 2). Let & = {X, B, 0, P}, where 
P is the set of densities 


p(x | 0) = Api(x | 6) + (1 — A)pa(z | 4), 
withO = X = 1. Then 


(16) 9(&) S Ad(E1) + (1 — A)I(E2). 


(An alternative statement of this theorem reads: For fixed X, ®, 0, and p(@), 
9(&) is a convex function of P.) 

The experiment &, described in the statement of the theorem, can be thought 
of as being performed as follows: With probability \, a value z is obtained ac- 
cording to the density p;(z | @); with probability 1 — A, z is obtained according 
to p2(x | 6). The experimenter is informed only of z and not of which event, 
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of probability \ or 1 — \, took place. Let the experiment &*, on the other hand, 
inform him about this event but not about the value of z. Then, clearly, using 
the notation developed above, 


(&, &*) = (A&i + (1 — AE). 
Hence, 
9(&) + 9(&*|&) = AG(E1) + (1 — A)S(E2) 


and the result follows since 9(&* | &) 2 0. 
Note that we have a convexity property here and a concavity property in 
the previous theorem. 


5. The previous development assigns to an experiment & and a prior dis- 
tribution p(@) a numerical measure of the average information provided by &. 
In particular, this permits a comparison to be made between the amounts of 
information provided by any two experiments &, &, with the same 9, with 
respect to a prior distribution. It also allows & and & to be compared absolutely, 
that is, without reference to a prior distribution, in certain cases. To do this 
we introduce 

Derinition 4. Let & , & be two experiments with 6, = 6, = 9. & is more 
informative than &> if 


(17) 9(& , p(@)) 2 9(E, p(8)) 


for all p(@), and strict inequality holds for some p(@). We write & > & or 
& < &. If equality holds in (17) for all p(@), we say & and & are equally in- 
formative and write & = &. We write & S &, or & = &, to mean either 
& < & or &) = &. 

There exist pairs of experiments for which neither &; = & nor & S &. The 
merits of such experiments can only be judged by reference to a prior distribu- 
tion. An example is given in the discussion of the binomial dichotomy after 
Theorem 9, below. 

THEOREM 7. If &:, &, &3 are three experiments with the same 9 and if &; is 
independent of both & and & , then & > & implies (& , &3) > (& , &s). 

For any p(@), by Theorem 2 


9(&) ; &3) = I(&s) + I(&, | &3) 
9(&3) + E.,9(&: , p(O | 2s)), 
by (14), since & and &; are independent. But &, > & implies, in particular, that 


9(&: , p(O| xs)) = 9(&2, p(O | x)) 


5 That is, for all prior distributions not merely for all prior distributions which are 
dominated by a fixed measure. 
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for any 2; . Consequently, 
9(E; ’ &3) = I(&3) + E.,9(&2 ’ p(e | Z3)) 
I(E3) + 9(E: | &s), 


again by (14), since & and &; are independent. A further application of Theorem 
2 establishes the result. 

TueoreoM 8. /f &; (i = 1, 2, 3, 4) are four experiments with the same © and ij 
&: > &, & > &4, & 1s independent of & , and & of & , then (& , &:) > (8, &). 

Let x; be the random variable cbserved in &,. Then, for any value of 6, x 
is independent of x; and zx, of x,. Consider a new set of random variables 
(vy: , Y2, Ys, Ys), Where, for any value of 6, y; has the same density as z;, y: is 
independent of y;3, y2 of y4, and, in addition, y2 is independent of y;. Let &; 
denote the experiment in which y; is observed. Clearly, for any p(@), 9(&;)= 
9(8;) and 


9(&1 , &3) = 9(E1, &3) = 9(Es, &) = 9(Es, &) = 9(E2, &). 


Both inequalities follow from Theorem 7, and the result is established. 

Two other methods of comparing experiments have been introduced. The 
first, due to Bohnenblust, Shapley, and Sherman, says that &; is more informa- 
tive than & if every loss function attainable with &, is also attainable with &, . 
The second, due to Blackwell, says that &, is sufficient for & if an experimenter 
performing &; can, by a random device, obtain a result equivalent to perform- 
ing &. (For a precise definition of these two relations, see Blackwell [2].) To 
avoid confusion, we shall speak of the relation introduced here as “more in- 
formative (S)’’ and the relation in terms of loss as “more informative (B)’’. 
For the latter, following Blackwell, we write &; > & , and for &; is sufficient for 
& , we write & > &. We remark that Theorems 7 and 8 above are the same as 
two theorems of Blackwell’s (see [4], p. 332) with > replacing D. We new dis- 
cuss the connections between these three relations. 

TuHeoreM 9. [f & and & are two experiments with the same 0, and if &, is suffi- 
cient for & , then &, is not less informative (S) than & . In other words, & > & 
implies &; = &. 

Let z; be the random variable observed in &; (¢ = 1, 2). & > & implies that 
there exists a stochastic transformation of x,, say 22, such that x; ¢ X, and 
x, and 2 are identically distributed for each @ ¢ 6. Let & be the experiment in 
which zx; is observed. Clearly, (6&2) = 9(&2). Consider the experiment & = 
(&, &2). Then, z; is sufficient for (2, , 22) in the Neyman-Fisher sense, and, 
hence, by the Corollary to Theorem 2, (6) = 9(&). But by Theorem 2, 9(§) = 
9(&); consequently, 9(&,) = 9(&) for all p(@), as required. 

Conditions are known under which the relations D and > are equivalent 
(see, for example, Blackwell [3]). Under these conditions, it will follow from 
Theorem 9 that not less informative (B) implies not less informative (S). That 
the converse of these results is not true can be illustrated by an example. 
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Consider the case of a binomial dichotomy. Here X contains two elements 
(0, 1), 8 contains two elements with O S 6; S @ S land p(z = 1 | 6;) = 6; = 
1 — p(x = 0| 6;). This experiment will be denoted by &(@,, 6). Denote the 
prior distribution over (6; , 62) by (A, 1 — A) withO S A S 1. It follows immedi- 
ately from (11) that 


(18) $[8(6, , 6), 3] = SA, + (1 — A)&) — AS(O,) — (1 — A)S(62), 


where 
S(@) = — 6 log @ — (1 — #)log (1 — 8). 


Consider a fixed experiment &(p:1 , pe) and compare it with &(6; , @2) as 6; and 


6, vary. To do this it is necessary to consider the right-hand side of (18) as a 
function of \ for (p; , pe) and for (0; , 0): (pi, po) 2 &(A1, 6) if, and only if, 


SE(pr , Po), A] S I[E(A: , G2), AJ 


for all A. It does not seem possible to describe the results analytically, and we 
therefore content ourselves with summarizing the results of some computations 
in the case p; = 3, po = 3. The discussion is carried out with reference to Fig. 
1, where P is the point (4, 2). It is known (see [2]) that the points (6, , 6.) in the 
areas with horizontal hatching correspond to experiments which can be com- 
pared with &(p:, pe) by either the relation > or >, which are, in this case, 
identical. For points in the triangular area, &(@: , 62) C &(g., peo); for points in 
the quadrilateral, &(@, , 62) D &(pi, pe); the remaining experitents are not 
comparable with &(p; , pz). Theorem 9 implies that the relation D may be re- 
placed by >, but computation shows that the points in the areas with vertical 
hatching correspond to additional experiments which can be compared with 
&(pi, pe) by the relation >. Those adjacent to the triangular area have 
&(0; , 02) < &(pi, pe) and those adjacent to the quadrilateral have &(@, , 6) > 
&(pi , P2). The points in the unhatched areas correspond to experiments which 
cannot be compared by the relation >. The points in the area of vertical hatch- 
ing show that the converse of Theorem 9 is false. 

The smallness of the unhatched region is a satisfactory feature of the com- 
parison by the relation >, for ideally all experiments would be comparable. 






SS 
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The following considerations support the view that the relation > holds in 
substantially more cases than does the relation >. Blackwell [2] remarks that 
the binomial trichotomy &, = &(0, 4, 1), in an obvious extension of the previous 
notation, is not more informative than & = &(0, 4, 4). We shall show that 
&1 > &. 

Let p(6) = dAp(O) + (1 — A)po(8) and let m(6) = (p, 0, g) and 
p2(0) = (p, qg, 0). Then p(6) = (p, g(1 — A), gd), a general prior distribution. 
From considerations of binomial dichotomies we have 9(&1 , p2(@)) = 9(& , p2(@)) 
and 9(& , 41(@)) = 9(&, p:(@)). From Theorem 5 


I(E: , p(9)) = AS(Ei, pi(O)) + (1 — A)S(Ei, p2(8)) 
AS(E2 , Pr(8)) + (1 — A)I(E2 , p2(8)) 
AS (Ee, p(8)) + (L — A)d(E2, p(9)) 
(82 , p(8)). 


Since p(@) is arbitrary and the inequality is clearly strict for some p(6), the 
result is established. 

Another difference between the two methods of comparison is provided by 
our Theorem 4. We deduced a result which we can now express as 


(as + (1 — a)8™) S 8, 


where n = Ak + (1 — A)m. W. H. Kruskal has pointed out that the same result 
is not necessarily true with S replaced by C. His example involves taking & 
to be a normal dichotomy, i.e., @ = (6, , 62), X is the real line, and 


p(x | 6;) = (2x)? exp [—3(x — 6,)’]. 


Let d,(i = 1, 2) be two decisions, with d; correct when 6 = 6;. Let pijn(5) be 
the probability of saying d; when @ = 6; on the evidence of the experiment §”’, 
using the decision function 6. The relation D can be expressed in terms of p2,(6) 
and pz,(6), and for some values of c, the function 


i IV 


inf; { Pion (8) + cpan(6)} 


is not concave. Thus it may, to quote an extreme case, produce a smaller loss 
to do no experimentation with probability (1 — A) and to perform s” with 
probability \ than to do &” with n = dk. 

We conclude this section by discussing another example of Blackwell’s (see [2]) 
which demonstrates the techniques of the present theory. Each member of a 
large population of individuals has or has not each of two characteristics H, S. 
The proportions, h and s, of individuals with characteristics H, S are known. 
The proportion w of individuals having both characteristics is not known. Let 
&(H) denote the experiment in which a random individual from the population 
of individuals having characteristic H is observed; use 6(~H), &(S), and &(~S) 
similarly, where ~H denotes the absence of the characteristic H. Suppose, with- 
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out loss of generality, that the characteristics are so named thatO SASs 3 
1—s 3S1-—A S 1. We proceed to show that &(#) is not less informative (S) 
than any of the other three experiments; that is, the best experiment is that in 
which individuals with the rarest characteristic are observed. Blackwell estab- 
lished the same result for not less informative (B) when w is known to be either 
hs or some specific alternative 6 ~ hs. Our result holds for any prior distribu- 
tion of w. 


Each of the four experiments is binomial with the following probabilities at- 
tached: 
&(H): pr(S) = w/h = 6, 
l1-h-—-s+w 1-h-s8 
nem i-h I1—h 
&(S):  pr(H) = w/s = hé/s, 
. l-h-—-s+w 1-h-s8 
&(~S): ~H) = ———————— = ————_— 6 
Oe: l-—s l—s Ties 
where 6 = w/h. The permissible range for @ is 0 S @ S 1. Consider an arbi- 
trary prior distribution for 6. 

Now each of the four experiments is binomial with probability of the form 
Ac + (1 — A)#, withO S AX S$ 1,0 Sc & 1. Alternatively, by introducing a 
random variable which is 1 or 0 according as the event, indicated above, does 
or does not occur, the probability density is 


p(l| @) = 9° + (1 — AO 
= d\pi(l | ) + (1 — A)po(1 | 6), 


where p;(1 | @) = c, po(1| 6) = 6. Let & , & be experiments with P; = {p,}, 
P, = {pe}. Then if & is any of the four experiments considered above, we have 
by Theorem 6 


9(8) S AS(E1) + (1 — A)S(E&2) = (1 — A)S(&) S 9(E), 


since $(&,) = 0 as p,; does not depend on @. But &(H) has A = 0, so that & = & . 
This establishes the result, since the prior distribution is arbitrary. 


6. Since Wald’s introduction of decision theory, many statisticians, the pres- 
ent author included, have identified the theory with statistical theory and have 
argued that modern statistics is decision theory. Some statisticians, for exam- 
ple, Barnard [1] and Fisher [15], have not supported this view; they have con- 
tended, for example, that the purpose of a significance test is different from the 
purpose of a Wald decision problem with two decisions, reject or accept. It is 
therefore contended that different mathematical models are needed for the two 
purposes. This latter view is supported by the fact that significance levels do 
not occur in decision theory. If the purpose of modern statistics is not to come 
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to decisions, we may ask what is its purpose? Without wishing to take sides in 
the issue we propose in this section of the paper to investigate some elementary 
consequences of the attitude that the purpose of some statistical experimenta- 
tion is to gain and measure information about the state of nature. 

The first consequence of this attitude is that the statistician, faced with a 
choice of one among several experiments that he might perform, will choose that 
one for which the average amount of information is the greatest. The choice 
will, in general, depend on his prior knowledge, but it may happen that the 
experiments will be absolutely comparable by the methods of Section 5 and the 
prior knowledge will be irrelevant. Examples have already been given, but there 
is one further case worth considering. Let &(¢) denote the experiment in which 
X and 6 are the real lines and 


p(x | 0) = (/2x 0) ‘exp [—(x — 6)°/20'], 


where o > 0. Here x is normally distributed about @ with known variance o’. 
We shall show that &(0,) > &(02) if o: < o2; that is, the experiment with 
smaller variance is the more informative (S). To prove the result, we show 
that &(0,) > &(o2) and then apply Theorem 9, with the additional remark that 
there obviously exists a p(@) such that 9(&(0,), p(@)) > 9(&(o2), p(@)). Let 
x; be the random variable observed in &(¢;). Then 


(19) m=u+tu, 


(where wu is a random variable, independent of x, , and having a normal dis- 
tribution with zero mean and variance o: — oj) has, for each 6, the same dis- 
tribution as z,. Equation (19) is thus a stochastic transformation from z, to z2 
and hence &(o;) > &(o2). 


A measure of the information can only be provided by assuming a particular 
form for p(@). Suppose that 


p(@) = (»/2xr)' exp [—(@ — u)'/2z"] = p, 


for some » and r > 0. It is easy to establish that p(x) is a normal density with 
mean w and variance o + 7°. Also, 


Iep(0) = —log (2me)"”r. 
Consequently, by equation (11), we have 
9(8(c), pe) = 4 log (1 + 1°/o°). 
This result provides an illustration of the truth of Theorem 4. If we use the 
notation of that theorem, with & = &(c), we have that 
jn = } log (1 + nr’/o’), 


which can be contrasted with the usual measure n/o’. Notice that j, increases 
without limit in this situation. 
Consider now the k-dimensional extension of these results. Let X and 0 be 


| 
| 
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k-dimensional Euclidean spaces and let x = {2,,--- , 2} have a multivariate 
normal density with mean 6 = {6;, --- , 6} and dispersion matrix C, which is 
known. Let @ have a prior density, p, , which is multivariate normal with mean 
w and dispersion matrix A. Denote this experiment by &(C); then, calculation 
along the same lines as in the univariate case gives 


9(&(C), pa) = 3 log {|\A + C| / |C}} 


where (C| is the determinant of C. Clearly, even for this limited class of prior 
distributions, the two experiments &(C,) and &(C2) will not be absolutely com- 
parable since their relative average informations depend critically on A. How- 
ever, in some circumstances there is a possible simplification. Generally, we 
have that 


9(&(C,), pa) > 9(&(C2), Pa) 


|A + C;| \C'2| a |A + C;| IC4| 


\1 + AC! \C3| > \] + AC,| IC;| . 


If the elements of A~’C, are small in comparison with the unit matrix, this is 
approximately 


\C2| > |Ci| . 


Hence, an approximate basis of comparison in this case, which corresponds to 
considerable ignorance about 6, is through the determinant of the dispersion 
matrix. The use of the determinant criterion has been used by Wald [11] in a 
slightly different context. 

A second consequence of the view that one purpose of statistical experimenta- 
tion is to gain information will be that the statistician will stop experimentation 
when he has enough information. Such a sequential method does not involve 
considerations of risks or cost of experimentation, but does involve a statement 
of prior knowledge. We consider next the sequential methods that this idea 
results in, for some special cases. In each case we shall consider a sequence 
&1, &,--- of independent, identical experiments which are to be performed 
until enough information about 6 has been obtained. It is therefore a question 
of how much repetition of a given experiment should be performed. 

We first take the dichotomy, with 6 = (6, , 6:). X, ®, and P are quite general. 
Let 6 be some preassigned number. Then experimentation will proceed; after n 
repetitions we shall have observations (2 , 2, --- , Z,) and the amount of in- 
formation will be 


(20) > F Pr( Oi) log Pn(8;), 
where 
Pu(9i) = p(Oi|t1,-+- , Zn), 
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the posterior distribution of 6. According to the idea introduced above, experi- 
mentation will continue until (20) is not less than 6. It is supposed that 6 is 
chosen so that this sequential scheme will terminate with probability one; in 
this case 6 must be negative. Since (20) is a convex function of p,(6:) = 1 — pple), 
the scheme corresponds to continuing sampling if, and only if, 


(21) 1-A< Pn(O1) ae. 
where 
A log A + (1 — A) log (1 — A) = 6. 


Expression (21) may be written in terms of the ratio of posterior probabilities 
for 6, and 6, , and by use of Bayes’ theorem, it may be put in the form 


[2 p(62) p(t, -**, tn | %) A p(62) 


A pi) ~~ p(t, -*+, tn | 62) ~ 1 — A phi)” 
It is now apparent that the sampling scheme is equivalent to a scheme used in a 
Wald sequential probability ratio test of 6, against 4 . 

The generalization to the case where 9 has n elements will be sufficiently 
illustrated by the trichotomy n = 3. The argument is as with the dichotomy 
up to the sentence before that in which (21) appears. Now the posterior dis- 
tribution p,(6;) may be represented by a point in an equilateral triangle of unit 
altitude, the distances of the point from the sides being p,(@;)(¢ = 1, 2, 3). 
Since (20) is again a convex function of the distribution, it follows that for suf- 
ficiently large values of 6, but 6 < 0, the regions of values of p,(@;) for which 
sampling will cease will be three congruent convex regions at the three corners 
of the triangle. The calculation of the exact shapes of the regions would be a 
simple matter. It is interesting to note that regions of similar convex structure 
are obtained for termination in an optimum sequential scheme for deciding be- 
tween three simple hypotheses with given loss function and prior distribution 
(see, for example, Blackwell and Girshick [4], p. 262). 

We now leave the case where 6 is finite and suppose 9 to be an interval on 
the real line. It is now necessary to remark that as Shannon’s measure of infor- 
mation is not invariant under a change of description of the parameter space, 
a different sequential scheme will be obtained if the description is changed. This 
unpleasant feature need not bother us unduly since sequential schemes based, 
for example, on the variance will have a similar feature. A sampling scheme in 
which sampling is continued until the variance of the estimator of @ is less than 
some prescribed number will differ from one designed for the variance of the 
estimator of (0). It is possible to find invariant sequential schemes by the device 
of sampling until the average amount of information to be gained by taking a 
further sample falls below a prescribed limit. It can then be argued that the 
further sample is not worth taking and sampling can therefore cease. We shall 
not investigate such schemes here; they will be invariant since the expression 
for the average amount of information is invariant. 








MEASURE OF INFORMATION 1003 


First consider repetitions of the normal experiment &(c), above, with prior 
distribution p, . After n observations with mean Z, which is a sufficient statistic, 
it is easy to verify that the posterior distribution of @ is normal with mean 
(n7°E + o°p) / (nr + o°) and variance o’7’ / (nr* + o”). The posterior informa- 
tion will therefore be —} log 2reo"r’ / (nr + o”) and sampling will continue as 
long as this quantity is less than 6, or, equivalently, until 
a Qneor —aoe” 

"= res 
Thus the optimum sequential scheme is of fixed sample size, given by the above 
expression. For large 7’, corresponding to small prior knowledge, the fixed sample 
size is approximately n = Ko’, where K = 2re”**. Thus the scheme is equiva- 
lent to sampling until the variance of the sample mean is sufficiently small. 

As a final example, consider the case of repeated binomial trials. In the ex- 
periment to be considered X = (0, 1), 6 is the unit interval 0 < @ S I, and 
p(1| 6) = @. The situation where the prior distribution is concentrated on a 
finite number of points is covered by the results above. We therefore consider 
densities over the whole interval of @ and, to simplify the calculations, confine 
attention to the family 


(22) Da(9) = 6 "(1 — 0)” “T(a + b) / (a) (b), 


with a and b positive. This family of densities has the property that if the prior 
distribution is pw(@), then the posterior distribution after a single binomial trial 


has been performed is pa4:.(8) Or Pai1(8), according as x = 1 or O, respectively 
(a fact which the reader can easily verify). Simple calculation shows that 


Tepa(9) = In P(a + b) / T(a)T(b) + (a — 1)[¥(a) — Va + 5)] 
+ (b — 1)[¥(b) — ¥(a + b)], 


where 
W(x) = dln T(z) / dz. 


This complicated expression can be simplified for large values of both a and b 
by use of the asymptotic formulas 


In T(x) ~ 3ln 2e — x2 + (x — 3) Ina 


and 
W(x) ~ Inz — 1/22. 
We obtain 
Tepa(9) ~ 3 In (a + b)* / ab — 4ln 2x -- 3. 


It follows that the curve in the plane of a and b along which J9p,.4(6) is constant 
is given approximately, for large values of a and b, by the curve 


(a + b)® = dab 
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for some constant A. The general form of this curve is shown in Fig. 2 by a con- 
tinuous line for a, b > 10; the broken extension shows the general form of the 
curve outside this range, as found by numerical computation. 

Suppose the prior distribution has a = ao, b = bo. Then, after a sample of 
size n has produced r values of x = 1 and n — r of x = O, the posterior dis- 
tribution will have a = ap + 7, b = bo + n — r. The experimentation can be 
represented in the (a, b)-plane by starting at P = (ao, bo) and forming a path 
by moving one unit along the a-axis for each value x = 1 and one unit along 
the b-axis for each value x = 0. Sampling will cease when the path intersects 
the curve corresponding to the amount of information required. If prior know]- 
edge suggests that @ is small, then presumably one would take ao to be smal! 
and bo large, in comparison (for example, the point P in the figure). Ignorance 
about 6 presumably corresponds to ag = bo = 1, or, at least a point with small 
ayo and bo. 

We conclude by making a few comments on the boundary curve shown in 
Fig. 2, based on the assumption that a9 = bo = 1. The most prominent feature 
is perhaps the sharp decrease in the critical value of b as a approaches one—the 
curve AB in the figure. Repetitions of one value of x, in this case x = 0, result 
in a greater accumulation of information than a mixture of both values. To cite 
a numerical instance: 6 occurrences of the value z = 0 are about as informative 
as 11 occurrences of x = 0 with one occurrence of x = 1, or 14 of x = 0 with 
two of x = 1. (The sample sizes are 6, 12, and 16, respectively.) This agrees 
with the ‘‘“common-sense”’ feeling adduced by the consideration that if the same 
thing continually happens, say the sun rises each morning, then we are much 
better informed than we would be if there was known to be even a single non- 
occurrence. In the contrary case, when @ is about 3, the part CD of the curve is 
relevant, and is approximated to by the fixed sample size scheme with boundary 
a + b = constant. The part BC of the curve can also be approximated to by 
the straight-line boundary b = constant. This would be appropriate if @ were 
about 4 (but not too small so that the sharp curve AB was relevant) and would 
correspond to sampling until b values of c = 0 had been observed. If z = | 
corresponds to a “defective,” this is the same as sampling, when defectives are 
rare, until the number of nondefectives has reached a preassigned number, and 
may be contrasted with inverse binomial sampling where the situation is similar 
but the rule is in terms of defectives. 
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PROPERTIES OF SOME TWO-SAMPLE TESTS BASED ON A PARTIC- 
ULAR MEASURE OF DISCREPANCY 


By L. H. WEGNER 
The RAND Corporation 


1. Introduction and summary. Let F and G be continuous univariate cdf’s. 
For testing the hypothesis F = G against general alternatives, E. Lehmann [4] 
has proposed and found certain properties of a test based on the particular 
measure of discrepancy f (F — G)’ d{(F + G) / 2]. In this note will be given 
some additional properties of Lehmann’s test (cf. also [8]) and a closely related 
test proposed by Mood [2]. 


2. The test statistics. Let X,,---, X, and Y,, ---, Y, be independent 
random samples from populations with continuous edf’s F and G respectively. 
Let [*) E Jem be the number of quadruples (X;, X;, Yk, Yr), i <j, k < 1, 


for which either the maximum of the X’s is less than the minimum of Y’s or 
the maximum of the Y’s is less than the minimum of X’s. Then 


an om = (2) (2) BEER Xm 


where X;;: is one if X;, Y¥; S Y,, Y: and is zero otherwise. Lehmann [4] has 
shown that Q,,,. is a minimum variance unbiased estimate of the functional, 


Q(F,@) = 4+ p+2f (Fr - G) a(F~*). 


~ 


Replacing F and G in (2.2) by the corresponding sample cumulative distribu- 
tion functions, say S and 7’, yields the statistic, 


(2.3) p= [(s- 7) a(St ‘", 


which is the symmetric version of a test statistic originally proposed by Mood [2], 
(2.4) a= [ (s- 7) ar. 


The critical region for each of the two-sample tests corresponding to the above 
test statistics consists of the region in the m x n dimensional sample space 
(or equivalently, the arrangements of the mX’s and nY’s) for which the test 
statistic takes on its largest values. In [8] the distribution of Q,,, when F = G 
has been tabled for a selection of small sample sizes. 


_ 3. The two-sample statistics expressed in terms of ranks. In order to see 


thonsiead January 17, 1955. 
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more closely the similarities among the above statistics, it is enlightening to 
express them in terms of ranks. Let R; be the rank of the ith ordered X and r; 
the rank of the jth ordered Y in the combined ordered sample of mX’s and nY’s. 
Lehmann [4] has given the following relation between Q,,, and these ranks, 


(3.1) ()(3) Qua = Do G — j) e oe ‘) +(j-1) ee ae ‘)). 
“ “ j=l “ “ 


From the definition of d, (2.4), we have 
is/j 1r-jY 
(3.2) dqd=- z ¢ a 1 — 3) ; 
n jm \n m 
and, by symmetry, 


e l n j r —j 2 . i R; se ; 2 
™ 2n 2 n m | + 2m 2, m n 


The above relations, after expansion and reduction, become 


n 


2 (3\(3) 2 G — 1)r; — Am + n — 2)jr; 


7=1 





9 
(3.4) — (n —2m + rs | + (n + 2m — 3) n(n + at 
+ (n+ m — 3m + 1) st tm mn(m — 1), 


2 (n + 1)(2n + 1) 


6 ’ 





(3.5)  m'n?d = > [nr? — 2(m + n)jrJ + (m + n) 
i=1 


2m'n’?D = Zz (nr; — 2(m + n)jr + > [mR? — 2(m + n)iRd 
i=1 i=l 
(3.6) 


+ oto SLES , BS ONES OY 


6 6 


4. Relations among tests when m = n. From the definition of Q,., it follows 
that we may replace r; by R; in (3.4) if we interchange m and n. Upon adding 


, m\/(n : 
the resultant expression for of he ean to (3.4), setting m equal to n, and 


employing the identities 
n n 
Lr, + 2 Ry = n(2n + 1) 
j=1 7=1 

and 


Lri+ LR; = nn + 1)(4n + 1), 
7=1 i=l 
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we obtain the following relation, 


(9) Qan = (n — [an (2n + 1)(n + 1) + n'(3n + 1) — 4 D0 5(r; + Rr» | 
a e j=1 


Proceeding in an analogous manner, we obtain from (3.6), 


2n‘D = n E n(2n + 1)(8n + 5) — 4 > 5; + Rr |, 
3 j=1 
Thus, when m = n, Qn, and D are related linearly and tests using large values 
of these statistics as critical regions are identical. 


5. Means and variances. 
Means. From (2.2) the mean of Qn, is 


(5.1) E(Qmn) = +2/ (F —@d (7 + ¢ ). 


9 


From (2.4), 


E(d) = | f (§ — T) ar | ™ e(f{s ar) — 2R (/ ST ar) 


4 (n + 1)(2n 1 1) 
on- 


Since E(T) = G and E(S’) = [F(1 — F) / m] + F’, the first term on the right 
of (5.2) becomes, with the aid of Fubini’s theorem, 


Dt aot. =F x alin 
E (/ S ar ) al “ + F Jaa. 


On the application of Fubini’s theorem and a special integration by parts (vide 
[7], p. 102), the second term on the right of (5.2) reduces to 


+ rm 2 9 
2B (f sr ar) —_ a. . [@ dF + 2 [fr dG. 
U i « t 


Thus E(d) may be written 
BH - 2 — / Fag +%—! | G? dF + —=m / F dG 
m n 


_ ok 4 (n + 1)(2n + 1) 


n 6n? 


which, after substituting the identity, 


[re dG + [e@ dF = 3+ / (F — GY d{(F + G)/2I, 
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becomes 


E(d) = | (F - ay a (EFS) -' [rae-* [ear 
(5.3) 7 
n — 2m 


9n + 1 


6n? 


[ Fac + 


mn 


From (5.3) and the symmetry of D, we obtain for E(D), 


aie a 2 ee 8 hi 
| E(D) = | (F - Gt a( ‘ )-L frac -* [ear 
(5.4) 

7” 


9n + 1 


m ——_—____ 
. 12n? 


2 


= om f Fag + 
omn 


When F = G, (5.1), (5.3), and (5.4) reduce to 


— / G dF + 
mn 


6L mn n? 
l]|m+n 1 1 
oe a + og + oa: 


b. Variances. A method for finding the variance of Q,,,. for general F and G 
has been given by Sundrum [8]. When F = G, he obtained 


(5.8) a (Qmn) = us) G) | (m + n)(m + n — 1) -2|, 


In the following there will be outlined a procedure (cf. Hoeffding [3]) for obtain- 
ing the variance of a Uy statistic as defined in Theorem 6.5 (Qma is a particular 
Uy statistic). This procedure also provides a result which will be needed in the 
proof of a later theorem. 

Set 


te(t1,°°* Ui, Yr, °° y Ys) 


” Et(t%,-°-:, Li, Xizty***, Xe; wu ™*? 5. aes Yj41,°°° Y,), 
fo = 0, 
bij - Elti(X oe X 5 ? Y; “Wd Y ;)] ee 6°, t, J = 0, 14 oe. 


Let (s;, °°: , 8), (83, °°*, 8), (a, °°: , t), and (t;, ---, t;) be four sets of r 
different integers, 1 < s;, 8; S m,1 S t;, t} S n, and let a and b be the num- 


ber of integers common to the sets of s’s and ?’s respectively. Then, from the 
symmetry of f(a. , +--+ , 2p, Yi, °** » Yr), it follows that 


(5.9) E{t(X,, x: -* Xe, ’ Yu, eos Y,,)t(X,; er" s xX. ’ Yu; So Y.;)] as eo 


= Lap. 
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ri . / . 
Thus, the variance of Uy can be written 


—2 -2 
o (Ux) = yy (”) E(>; (Xe,,°°* > Kops Vy °°*s Ye) — al’ 


r r 
(5.10) - Oy: f*y" > > Smee. , «+, Kes Ver ***s Ye) 


. 3 rT, 
- UX, ae ws Aes Y;;, es Y,')] we e 


where >>“” stands for summation over all subscripts such that 1 S 8, < --- < 
' ' é - - ae 
Samet SG <i Ce Sah eS KESSEL SG <<: eS 
. / . / . . 
and exactly a equations s; = s; and b equations t, = t; are satisfied. 


+ - ° b) « = rn . (ab 
From (5.9) each term in }>” is equal to fa. The number of terms in >>“ 


om \ in / 
is 4 bg tea 7 i), so that (5.10 becomes 
, hens m\" (n\' Ga l(m—r\(/n—r 
ee * (7) (*) x= - a) - se No) 


To find the variance of Qmn by the above method, we must first obtain the 
fa’s a, b = 0, 1, 2, and then combine them according to (5.11). 

Since fo = 0, we have the following additional result, which will be needed 
in the following section. 


CY GY LEK) - Ce YF Ym eon 


o(1) as min (m,n) > @~, 


o’ (Uy) 


IA 


(5.12) 


when max (fa) < ~. 
From (3.5) the variance of d is 


o*(d) = m‘n“* [ var (> ri) + 4(m + n)* var (x irs) 
j=l 


=! 
(5.13) 
- 4n(m + n) cov (S14 Lan h 
i=l 
When F = G, the distribution of r; and the joint distribution of r; and r, (j S k) 


are easily seen to be 


(5.14) oe - + 7) (‘ - |. +n — ni), jens m+ j, 


n j-1 n—-j rasa 
yea re ee ~ 
S(r;, Te) = ( n j-1 k-j-l, 
< 


ghee: jsrnen—-(k-j) Sm+yj, 
n—k ? lsjsksn. 


and 


(5.15) 


Equations (5.14) and (5.15) may be used to find the three terms on the right of 
(5.13). After lengthy but straightforward calculations, we have 
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(5.16) var (> 7) = a mn(m + n + 1)(2m + 2n + 1)(8m + 8n + 11), 


j=l 


(5.17) var ( irs) = ap mn(m + n + 1)(2n + 3)(2n + 1), 


/ 


n n 
CoV (2 r ‘ > ins) 
j=l 


i=1 


(5.18) 


1 ' 

= am mn(m + n + 1)(16n? + 16mn + 14m + 31n 4+ 13). 
avo 

The following relation, which will be used in Theorem 6.1, can be obtained in a 

similar manner, 


n 
. 1 
(5.19) var (> rs) * 2. mn(m + n + 1). 
\j=l - 

Substituting (5.16), (5.17), and (5.18) in (5.13) and simplifying, we obtain for 
the variance of d when F = G, 

“ a (m+n)\(m+n+1 m+n 1 9 2 . 
(5.20) oa (d) = im + n)(m + » + 1) a2 ot. - (12m" — 3n° — 2mn). 

45m?n? 180m*n’ 

6. Limiting distributions. 

a. Under the null hypothesis. The following two theorems are concerned with 
the limiting distribution of Qn, , d, and D under the null hypothesis F = G. 

TxHeoreM 6.1. /f F = Gandm/n—c > 0asn— ~@, the statistics 

1 mn mn mn 
ea Qnn — E — (Q — D — E(D), d — E(d 

5 a (nn Qn) =" (D- ED), fd — EO) 
have the same limiting distribution. 

Turorem 6.2. Jf F = G and m/n ~ ¢c > 0 as n — @, the statistic 
mn/(m + n)|[d — E(d)| has the same limiting distribution as nw — E(nw”’), 
where w is the von Mises statistic. (The limiting distribution of nw’ is tabled in [1]; 
E(nw’) = 3.) 

It follows from the above theorems that 3[(mn /m + n)][Qmn — E(Qmn)] has 
the same limiting distribution as nw — E(nw’). In Figure 1 are compared the 
distribution of Qs and the limiting distribution of nw’ drawn with the appro- 
priate scales. 

Proor oF THEOREM 6.1. From equations (3.4) and (3.5), we may write 


m nm , 2 
2(3 (3) On —mnd 


(6.1) 


=— Dr +4). jr; — (n — 2m +1) D4; + O(n’). 
j=1 j=1 j=1 


From (5.16), (5.17), and (5.19), each of the terms on the right has variance O(n‘). 


1 Theorem 6.2 is due to M. Rosenblatt [6]. 
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Cumulative probability 


O01 02 03 O04 OFS O06 OF O08 O98 10 
100Q53:20 26 36 44 §2 60 68 76 84 92 100 


Fie. 1 


Thus, aside from terms which converge to zero in probability, 


1 mn mn 1(m\'(n\" 2 2 ‘ 
(62) 2m+n Quan — E(Qua)] = m+n4 6) (>) = 


mn . mn m+n-— 1 
Ree et ie -te oe 
From (5.20), o°(d@) = O(1/n’). Thus the second term on the right of (6.2) con- 
verges to zero in probability. 

The proof that mn / (m + n)[D — E(D)] has the same limiting distribution 
as $[mn/(m + n)][Qmn — E(Qmn)| is analogous and will be omitted. 

b. Under the alternative hypothesis. An important subclass of the class of con- 
tinuous cdf’s is the class of strictly increasing continuous cdf’s. The following 
two theorems are concerned with the problem of finding the limiting distribu- 
tion of Qn. , d, and D when F and G are in this subclass and F ¥ G. 

THEOREM 6.3. If m/n — c¢ > O as n — ©, then the statistics 


$[mn /(m + n)}""[Qnn — E(Qmn)], [mn / (m + n)}"[d — Ed), and 
[mn / (m + n)}"[D — E(D)] 


have the same limiting distribution. 
Proor. It follows from (6.1) and the inequalities, 


> ir, < Sri < nlm + n)? and > 1; < n(m + n), 
j=l j=l j=l 
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that we may write, aside from terms which converge to zero in probability, 


_ mn ia Ne ik _ mn _ 2-2, {m\(n _p 
a [d — E(d)) m+n . n 2 ° \(3) [Qmn — E(Qmn)] 


1 mn MO 
= 3 oo - a Ms = E(Qmn)] 


4 / oe i. 


m+n 2mn 


From (5.12) it follows that o(Qn.) = o(1) as min (m,n) > ~, so that the 
second term on the right of (6.5) converges to zero in probability. 

The proof that [mn / (m + n)]'"[D — E(D)] has the same limiting distribution 
as 4{mn / (m + n)]'"[Qmn — E(Qmn)] is analogous and will be omitted. 

Tueorem 6.4. If m/n = c > 0 as n — ©, then the statistic 
[mn / (m + n)}"[Qmn — E(Qmn)] has a normal limiting distribution. Excluding 
F and G for which either F = G or { F dG = 0 or 1, the class for which nondegen- 
eracy occurs includes all continuous F and G which are strictly increasing through- 
out their range of variation. 

In the proof of Theorem (6.4) we shall need the following theorem of Leh- 
mann’s [4]. 

THeoreM 6.5. Let X1,---, Xm, and Y;,---, Yn be independently dis- 
tributed random samples from the distributions F and G@ respectively. Let 
t(t1,°°*, Ze, Yr, *** , Yr) be symmetric in the x’s alone and in the y’s alone. 
Suppose that 


Ejt(X1,---,X-, ¥i,-°::, Ys] = 0(F, G) = 8, 
E[t(X, eS X;, Y; ‘6 Y,)’)] =M< _~. 
Let m/n = cand let n be sufficiently large so that r S min (m, n). Define 


=i ~t 
Ul, = (") (") OY A ink HE in os WEN 


r r 
where the summation is extended over all subscripts 
lsaa<: fem, leak <---< &4 


Then, as n — ~, [mn /(m + n)]'"(U1, — 0) is asymptotically normally dis- 
tributed; furthermore, if we set 


v3 (x1) = E{t(x,X¢ grr sy By Y; pore Y,)] a 6, 
¥o(yi) = Elt(X1,---,X-,y%1, Y2,---, Y+)] — 4, 


then the limiting distribution of [mn / (m + n)]'?(U%, — 6) is nondegenerate pro- 
vided 


Elyi(X1)] + Ely2(¥1)] > 0. 


2? Theorem 6.4 is an amended version of a statement by E. Lehmann [4, p. 173], which 
did not sufficiently restrict F and G for nondegeneracy. 
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Proor or THEeoreM (6.4). Set ((X,;, X;, Ye, Yr) equal to X,ja: [which is de- 
fined following (2.1)]. Then Qn, is seen to be equivalent to U‘, in Theorem 6.5, 
where @ = Q(F,G) andr = 2. The first statement of Theorem 6.4 follows imme- 
diately. To prove the second statement, we apply the second part of Lehmann’s 
theorem. We have 


W1 (a1) = Elt(x1, was : Y>)] — 6 


- [ 


/—co 


) ay 


| 2G(y) dF(x) dG(y) + 2[1 — G(y)] dF(x) dG(y) — 6 
y 


vz, ¥— 


ety “30 


3 | (1 — F)GdG +2 F(i — G) dG — 8 


v—2 “71 


pry 


-2f @-Fraet+2f ra-aae-e. 


v—20 


Set I(x) = g. (G — F) dG. Then EWi(X,)] = 0 implies that 


(6.4) I(X;) = EUI(X))] 


with probability one (with respect to F). 

Suppose now that the restrictions of the second statement of Theorem 6.4 
hold. This implies that there exist two points xo and 29, 2% < 29, which are 
points of increase of both F and G. With no loss in generality it may be assumed 
that G(x) — F(x) = 6 > O for zx in the interval (zo , zo). It follows that I(2) > 
I(2o) so that (6.4) can not hold with probability one. 


7. Consistency and unbiasedness. 

a. Consistency. For the class of continuous cumulative distribution functions, 
the test based on Q,,, of the hypothesis F = G against the alternatives F ~ G 
has been shown by Lehmann [4] to be consistent at each level of significance 
when min (m,n) > ~. 

With the aid of the theorems on limiting distributions and the fact that the 
means of d and D are linear functions of f (F — G) d(F + G) plus a term which 
is o(1) as min (m, n) > ~~, it readily follows that the tests based on d and D 
are consistent under the conditions of the above paragraph provided the addi- 
tional restriction m/n +c > 0 as n — ~& is imposed. 

b. Unbiasedness. That the tests based on Qn, , d, and D are not unbiased tests 
of the hypothesis F = G against all continuous alternatives F # G and all 
m and n is shown by the following example. 

Let F; and G, be cdf’s with the probability density functions 


f(z) = 4, 6S 251285235 3 


= 0, otherwise, 
and 
gi(y) = 1, l y 2 


= 0 otherwise. 
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In the m x n dimensional sample space let Ws, be the region for which 
max (7, --+, 2m) < min (y:,---, yn) and let WE? be the region for which 
min (4%, +++, 2m) > max (yi, --* , Yn). Then 


n 


1 
P(Wan | F = G) = P(Waa| F = G) = e : ") : 


P(Waa|F = Fi,G = G,) = P(WS|F = F1,G = G) = ())”. 


, ' = m+n ‘ 
Since, for fixed n and sufficiently large m, ( : ) < 2”, there exist m; and n 
i 


such that both 


P(WY,.,|F = Fi,G =G)) < POW, | F = @), 


P(W®.,|F = Fi,G = Gi) < P(WE,,|F = @), 


so that any test of the hypothesis F = G having Wo) .,, Woon, , or WS?,,UW es, 
as a critical region will be biased against the alternative F = F,,G = G,. 

Since critical regions for the tests based on Qma , d, and D are regions yielding 
large values of these statistics, it can be seen by examining the maxima of these 
statistics over the possible arrangements of X’s and Y’s that for each test and 
every m and n, one of Woe We, or W2UWS > is a possible critical region. 
Thus, each of these tests is biased against the alternative F = F,,G = G,, 
when m = m,n = 1. 


8. The power of the test based on Q,,, for a particular class of alternatives. 
In [5], Lehmann has discussed the power of several two-sample distribution- 
free tests for the particular class of alternatives G = F*(k = 2,3, ---,). One of 
the tests considered by Lehmann was the two-sided version of Wilcoxon’s rank 
sum test, which we shall use here as a basis of comparison. With the aid of Leh- 
mann’s technique, the exact power of the Q,,, test was found for m = n = 4 
to be 0.19 against the alternative G = F’ and 0.32 against G = F*, which results 
are identical with the corresponding results for Wilcoxon’s test. For larger m 
and n, the approximate power of the Q,,,, test was obtained by use of the approxi- 
mate distributions indicated in Section 6. Against the alternative G = F’, the 
approximate power was slightly larger than that of Wilcoxon’s test for 5 S m = 
n < 40 and slightly smaller for m = n > 40. Against the alternative G = F*, 
the approximate power was essentially the same as that of Wilcoxon’s test for 
5 < m =n < 15 and slightly smaller for m = n 2 15. 


REFERENCES 


[1] T. W. ANDERSON AND D. A. Daruina, ‘Asymptotic theory of certain ‘goodness of fit’ 
criteria based on stochastic processes,’”? Ann. Math. Stat., Vol. 23 (1952), pp. 
193-213. 

([2] W. J. Drxon, “‘A criterion for testing the hypothesis that two samples are from the 
same population,’’ Ann. Math. Stat., Vol. 11 (1940), pp. 199-204. 

[3] W. Howrrpina, ‘A class of statistics with asymptotic normal distributions,” Ann. 
Math. Stat., Vol. 19 (1948), pp. 293-325. 





1016 L. H. WEGNER 


[4] E. L. Leumann, “Consistency and unbiasedness of certain nonparametric tests.’’ Ann. 
Math. Stat., Vol. 22 (1951), pp. 165-179. 

[5] E. L. Leumann, ‘‘The power of rank tests,’? Ann. Math. Stat., Vol. 24 (1953), pp. 23-44. 

[6] M. Rosensuarrt, “‘Limit theorems associated with variants of the von Mises statistics,”’ 

Ann. Math. Stat., Vol. 23 (1952), pp. 617-624. 

] S. Saks, Theory of the Integral, New York, Hafner Publishing Co., 1937. 

8] R. M. Sunprum, “On Lehmann’s two-sample test,’? Ann. Math. Stat., Vol. 25 (1954), 
pp. 139-146. 


” 
‘ 


[ 
[ 





POWER OF ANALYSIS OF VARIANCE TEST PROCEDURES FOR 
CERTAIN INCOMPLETELY SPECIFIED MODELS, I':? 


By Hewen Bozivicu,? T. A. BANcrort, AND H. O. HARTLEY 


Statistical Laboratory, Iowa State College 
TABLE OF CONTENTS 


. Introduction 
1.1 Description of pooling procedures 
1.2 More precise formulation 
1.3 Reduction of mixed models to random models ie 
1.4 Related papers and objectives of present investigation 
2. Exact and approximate formulas for power. Component of variance model 
2.1 Mathematical formulation of the pooling procedure 
2.2 Integral expressions for the power 
2.3 Exact formulas 
2.31 Series formulas 
2.32 Recurrence formulas 
2.4 Approximate formulas 
2.5 Theory of reduction of mixed model to random model 
2.6 Application of derived formulas et 
3. Discussion of power and size and comparison of 1 test procedures atest 
3.1 Type of recommendations attempted 
3.2 Size 
3.3 Frequency of pooling 
3.4 Power.... 
References cited...... 
Appendix: Figures and tables... 


1. Introduction. 

1.1 Description of pooling procedures. The simplest situation of a pooling pro- 
cedure for testing hypotheses using analysis of variance procedures may be 
described as follows: We are given three mean squares, V:, V2, V3, based on 
NM, , M2, and nz degrees of freedom, respectively, and designated as treatment 
mean square (V;), the error mean square (V2), and the doubtful error mean square 
(V,). It is desired to test a null hypothesis involving V; , which can be tested 
by comparing V; with V2 by the F-test. It is now suspected that V, is also a 
measure of the error variance, that is, has the same expectation as V; . It is de- 
cided, therefore, to first perform a preliminary test of significance by comparing 
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V. against V; by the F-test, and, if this turns out to be nonsignificant, to use 
the pooled mean square V = (mVi + m2V2)/(m + me) as error for comparison 
with V; in the final F-test. In the case that V» is significantly different from V;, , 
however, use V2 as error in the final F-test. This test procedure is usually re- 
ferred to as the sometimes-pool procedure. In corresponding terminology the 
single test V;/V2 is called the never-pool test, and the procedure of employing 
V = (mVi + nmeV2)/(m + n2) as error and always testing only V;/V is called the 
always-pool test. If the level of significance for the preliminary test is 100% , the 
sometimes-pool test becomes the never-pool test; if, on the other hand, the level 
is 0%, the sometimes-pool test becomes the always-pool test. With the some- 
times-pool test, the recise nature of the final F-test is, therefore, not determined 
in advance, but it depends on the relative magnitude of the observed mean 
squares V, and V;. 

When the analysis of variance and associated tests of significance were first 
developed by R. A. Fisher, such procedures were not advocated. Indeed, Fisher’s 
original description of analysis of variance tests clearly stipulated that for every 
well-designed experiment there can only be one correct analysis and the test(s) 
of significance are completely determined before the experimental results are 
available. With Fisher the appropriate test of significance is determined by a 
specification of the population from which the experimental data were sampled. 
We may speak in this case of an analysis determined by a completely specified 
model.* However, in experimental design, situations frequently arise in which 
the model is not completely specified. Furthermore, with the wider application 
of analysis of variance to operational research and to the study of routine data, 
statisticians are often faced with analyzing data which have not resulted from 
a designed experiment, and in these situations the model is often incompletely 
specified. In such cases preliminary tests of significance have been used, in 
practice, as an aid in choosing an appropriate specification from which valid 
subsequent inferences may be drawn. In particular, preliminary tests of sig- 
nificance procedures have been used in the past in an attempt to increase the 
number of degrees of freedom associated with the error in a final F-test, thereby 
apparently increasing the sensitivity of the final F-test. Justification for the use 
of such methods has been made apparently on intuitive grounds. 

The procedures described above and similar pooling procedures can be re- 
garded as dealing with these situations sequentially in two stages: the preliminary 
stage in which inferences are drawn about the model, and the final stage in which 
inferences are drawn about the parameter(s) involved in the main hypothesis. 

The purpose of the present study is to critically examine the consequences of 
certain pooling procedures with regard to the resulting errors of the first and 
second kind, for certain random and mixed models. Finally, on the basis of 
these results, we shall attempt some general recommendations on the advisability 
or otherwise of using them. 


* For the general formulation of model specification, see Section 1.2. 
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TABLE 1 


Component of variance model with o, > 0—Analysis of variance 


So vce of variation df Mean square Exp. mean square 


Between A ; =q-1 A o: + sop + rso, = 
Between B within A 2 g(r — 1), rs o + 80 
Within B 1 = qr(s — 1) F o 


2 
b 


1.2 More precise formulation. Let us assume the component of variance model 
(1) Link = uta: t+ bi; T Zijk 


where i = 1, 2,---,q;j = 1,2,---,r;k = 1,2,---, 8; a; is N(O, a2), by is 
N(O, os), and 2;; is N(O, 02). We wish to test a hypothesis concerning a; . If 


o, > O0anda > 0, then 


tig = wp tazt bi; + Zijk for o} 
(2) 


Lip = wp +O; + Zizn for 
In this case (1) is said to be an incompletely specified model. If, however, 
(3) Lik = wt a; + OG; + ie, 
and (1) is completely specified. Similarly, if o, = 0, 
(4) Lijk = + Ay + Zize, 


and again (1) is completely specified. 

We wish to test the hypothesis Ho:0, = 0 against the alternative Hi:02 > 0. 
Now let us assume we have the completely specified model given by (3). Then 
o, > 0, and we obtain the analysis of variance given in Table 1. Then it follows 
from the likelihood ratio principle that the appropriate test procedure is to cal- 
culate the test statistic 


Fo os 


and to reject Ho if Fo 2 F.(n; , nz), where a is the prescribed level of significance. 
This test is the never-pool test. 

Next let us assume the completely specified model given by (4). Now the 
expected mean squares of Table 1 no longer include the o; component, since 
oc, = 0. Application of the likelihood ratio test procedure to this model for the 
test of Ho gives us the test criterion 


nas . (my + m2) V3 
6) ") = as : 
° me ny Vi + Ne Ve 


and the rule to reject Ho if Fo = Fa(ns , m + mm). This gives us the always-pool 
test. 
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specified model given by (2). Ordinarily this model (2) might be assumed when 
there exists some uncertainty as to whether o; = 0 or o > 0. In such cases of 
incomplete specification, attempts are often made to resolve the uncertainty by 
first testing the hypothesis that o; = 0. The model finally chosen and, hence, 
the final test (test of Hy) depend upon the outcome of this original test. When 
the original and final tests are performed on the same set of data, the original 
test is referred to as a preliminary test of significance. In our example the pre- 
liminary test becomes the test of Ho:0; = 0 against Hj:0; > 0. Again, a likeli- 
hood ratio test procedure is available for this preliminary test. The statistic 
Fy = V2/V; is calculated and H¢ rejected at the level a; (usually different from a) 
if Fy = Fa,(ne, m1). If Ho is rejected, the non-pooling test procedure indicated 
by (5) is used for the final test. If Ho is not rejected, the pooling procedure indi- 
cated by (6) is used for the final test. 

It should be noted that when the final test is carried out, the model is assumed 
to be completely specified, that is, to be either model (3) or model (4), according 
as the preliminary test is found to be significant or not significant, respectively. 

The essential features of the sometimes-pool procedure as applied to the com- 
ponent of variance model described may be summarized as follows: 

(i) The three mean square V; are independently distributed as xio7/n; , 
where x7 is the (central) x’ statistic for n; degrees of freedom; 

(ii) The main purpose of the analysis is to test the null hypothesis oj = 03 
against the alternative 03 > 03 ; 

(iii) The error mean square V. has an expectation 2 which is greater than or 

equal to the expectation, oj , of the doubtful error mean square V; , which may 
or may not be pooled. 
It is clear that the above hierarchical classification is not the only analysis of 
variance situation giving rise to the above conditions. As an example we may 
quote the two-way classification with both factors random and cell repetition. 
Here V, would play the part of the within-cell mean square, while V2 would be 
represented by the residual in the two-way analysis. 

Aside from the preliminary test for the complete specification of the model, 
it is to be noted that we have made the assumptions usually made in the custom- 
ary analysis of variance, namely those associated with an additive analysis of 
variance model. It is sometimes correctly argued that these latter assumptions 
may not be justified in certain situations, and in others may represent only an 
approximation to the actual mechanisms generating the data. This issue is, of 
course, one affecting the analysis of variance tests in general, and has led to ex- 
tensive studies of the validity of these tests when the basic assumptions are not 
completely satisfied. If there is some doubt regarding the detailed assumptions 
for the analysis of variance model, it should be possible also to formulate the 
problem as being incompletely specified in these other respects. We are not con- 
cerned with these issues here. In extending the analysis of variance theory based 
on the assumption of linear models, our results are, strictly speaking, limited to 
situations in which these other assumptions are satisfied. However, the classical 


° 2 ° ° 
Finally, we assume o 2 0 and, hence, are confronted with the incompletely 
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TABLE 2 


Mixed model example—Analysis of variance 











Source of variation df Mean square Exp. mean square 
ii ieatiain de » —_— 2 
etween rations n3=k-—1 V; o3 = og + Mog + MNB<a) 
Rens z rations ne = (k — 1)(n—1) V2 o = é ee mea 
Within pens n, = nk(m — 1) V; oi =a; 


analysis of variance tests have been found to be remarkably robust, that is, not 
sensitive to certain deviations from the basic assumptions.’ We expect, therefore, 
that our present results will likewise be applicable as useful approximations to 
a wider class of situations. 

1.3 Reduction of mixed models to random models. The preceding section has 
been devoted to random models only. Another frequently occurring type of 
model is the mixed model, in which one of the factors is fixed and the other 
factors are random, and the hypothesis of interest is concerned with the fixed 
factor. A typical example of an experiment giving rise to this type of model is 
a randomized block experiment in which k rations are fed to each of m animals 
of a pen in each of n replicates. Then a suitable model for these data is given by 


Leg = wt ae + Oi + des + 2ei;, 


where the replicate variates b; , error variates d,; , and within-pen error variates 
21; are assumed to be random samples from the respective normal populations 
N(O, 05), N(O, 03), and N(0, o:), while the ration means a; are fixed parameters. 
The analysis of variance based on this model is shown in Table 2. Here 6,4) = 
: (a, — a)’/(k — 1). Following the same general consideration of Section 1.2, 
it is shown in Section 2.5 how the sometimes-pool procedure for this model can 
be reduced to that of the random model. 

1.4 Related papers and objectives of present investigation. The problem to 
be discussed here is from a general area of preliminary tests of significance. Work 
in this area includes studies by Bancroft [1], [2]; Mosteller [12]; Paull [14], [15]; 
Kitagawa [10]; Bechhofer [4]; and Bennett [5]. Paull [14], [15] studied the size 
and the power for the component of variance model described in Section 1.2. 
However, he was able to express the size and power in closed form for the case 
n3; = 2 only, so that all comparisons made by him are restricted to that value 
of n3. 

The object of the present study is to provide the necessary extension of Paull’s 
investigation to cover all of the important degrees of freedom combinations 
occurring in the analyses of variance under discussion. This extension was made 
possible by 

(i) the development of the power integrals as series formulas for even values 
of the degrees of freedom m , nz , and nz ; 


5 See, e.g., Box ([6], [7]) and the numerous references to earlier work given there. 
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TABLE 3 


Component of variance model—A nalysis of variance 


, 4s M | | Exp. mean 
Source of variation ean square | df square 





Treatments : 3 Ns 
Error 4 Ne 
Doubtful error P ny 


(ii) the derivation of recurrence formulas for the power for even values of 
Ny, Ne, and nz; 

(iii) the development of approximate formulas valid for large degrees of free- 
dom for even values of m , m2, and n;. 


2. Exact and approximate formulas for power. Component of variance model. 

2.1 Mathematical formulation of the pooling procedure. We now derive formulas 
for the power and size of the pooling procedure applied to the component of 
rariance model described in Section 1. Let us first state the procedure in mathe- 
matical terms. We are given an analysis of variance as shown in Table 3. 

We are interested in testing the hypothesis Ho:03 = o2 against the alternative 
Hi:03 > o2 when it is known that o3 > o2 > oi . We assume the sums of squares 
n,V; are independently distributed as xi; , where x; is the central x’ statistic 
based on n, degrees of freedom. The test procedure with sometimes pooling V2 
and V;, is then as follows: Reject Hp if 


7 either {V./V, = Fas ,n, (ar) and V3/V2 = Fn ,no(a2)} 
) 


or {V2/Vi = F ng .ny (a1) and V;/ ; = Png. +ng(a3)}, 


where V = (mVi + mV2)/(m + m) and F,,.,;(a) is the upper 100a % point 
of the F-distribution with numerator df = n; and denominator df = n;. 

The probability, P, of rejecting Ho , which in general is the power of the test 
procedure, is a function of the degrees of freedom, m , nz, and n;, the ratios, 
O32 = 03/03 and 6, = 02/0; , and the levels of significance employed, a; , a2 , and 
a3. In the special case when 63. = 1, this power is equal to the size of the test, 
i.e., the probability of type one error. In general the power P is obtained as the 
sum of two components corresponding to the mutually exclusive alternatives 
headed by either, and/or in its definition above, namely, 


(8) P, = Pr {V2/Vi = Fazn,(o1) and V3:/V2 = Fa,.n,(a2)}, 
(9) P, = Pr {V./V, < Pig n,(@1) and V3;/V = F ns .ny-+nq(@a) }. 
2.2 Integral expressions for the power. Definitie integrals for P; and P» will 


now be derived. The joint density of V;, V2, and V; is given by 


i ae i No V nz V; 
co Vi™ yi" type ; . + my), 


> 03 
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where ¢; is a constant independent of Vi , V2, and V3. By introducing new vari- 
ates, 


Ne Ve N3 V3 nm Vi 
7 gy v = y,? w oS , 
~ Oo, V 1 Gate ts N3 
and integrating out w, we obtain for the joint distribution of u and v 
ky! (*2+ 88-1 bes—t 
flu, v) (i Fou + w)irrteaten 
where 
l 
~ Bin/2 2, 2, 02/2) B( (n3/2, (ni + N2)/2) 


The probability of rejecting the hypothesis Ho is obtained by integrating f(u, v) 
over the two ranges of variation of u and v which correspond to the two alterna- 
tives either and/or of definition (7). These ranges are respectively given by either 
uy 
4 :U< ow, Svu<o 


Boy ' 29 


u3(1 7 Bau) — 
where 


(10) — From(o), U2 = wn Pw ng (az) 


and 


0 Nz 


Us = ——— Fy, .n4nq (as). 
3 nm + Ne MyM +nq\F3 


Hence the formulas for the two power components become 


| f(u, v) dv du 
a “d 


[ f(u, v) dv du 


e(1462;u)/u 


where 


(11) 
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2.3 Exact formulas. 
2.3.1 Series formulas. 


P k  ~ yiiratnad—l inst 
oe rf I, (1 + u + up) irrtnstna dv du. 
The transformation z = (1 + u)/(1 + u + w) yields 


© pr, artesd)—1(1 os zine t inet 
1 do (+ uiimten dz du 


i+ u 
1+ ual +d)’ 


(ng/2)—1 


un = 


The binomial expansion of (1 — z) gives us 


a® pz 4nq-1 
ii 1 ee , 
P, _ k | | C+ uhm) dz au, 


where 


gn3z—1 -(ns /9 as 1 en ‘aciiiaaiai 
f(z) ad > (-1)'( / j )s 1+722)+F- 
7=0 


Upon performing the integrations with respect to u and z, we obtain 


2 


[(mi + n2)/2 + 7 — 1)B(ni/2, n2/2)B(ns/2, (m + m2) /2)(1 + d)"?” 
Py pork apes 

x) RA J -Tea(m/2 + j§ — 1 — 1, m/2 +1) |, 

r= (1 + da)’ 


where 


1 


We now consider 


a po yl rete) (2-1 ma/2—1 
P,=k| | —_—_—_——__—__________ dv du, 
04s, (1 + u + uy) ™tretrn? 


zs = [c(l + On u)]/u = (c + bu)/u, 
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Using procedures similar to those used in deriving P; , we obtain 
( #1 ~~ -_ ') 
—1 ; 
P» ing J : ( ) ae 1 
| ((m + m:)/2 + 7 — 1)B(m,/2, m/2) | 
- B(ns/2, (ms + me)/2)(1 + b)"*7(1 + &)"*" 


(7 | 1) Blin/2) + 7, (m/2) 44-11 


rt (1 + b)\(1 + ec) 
where 
a +d) 
(14) ™ = ioral dD’ 


2.3.2 Recurrence formulas. Integrating P: (as originally given in section 2.2) 
by parts with respect to v, we obtain 


(d)*3/?" «© yi! rena) /2—2 
ee ee 
(m1 + m2 + m3)/2— 1 Ja (1 + ull + d))otrernn et 


(n3/2 — 1)k 
Ge ayer Pil — 2). 


Upon integrating with respect to u, we obtain 


n3/2—1 / i j a 
(15) P,(ns) _ @) Tex(tuy 2, me + = ae )/2—1 
(n3/2 — 1)B(ns/2 — 1, n2/2)(1 + d)*"**"* 


P,(ns) = du 





+ Pi(ns — 2), 


where 22 is given by (12). For the set of initial values at n; = 2 it is found that 
Tz,(m/2, 2/2) 

a+a* - 

The recurrence development for P» is similar but more cumbersome. We obtain 
the relation 


Palm, mg) = —t— ——_ A/a Tes + 12)/2 = 1 maf2)__ 
(17) ee 1+ ¢ ((m + n2)/2 — 1) B(n,/2, n2/2) (1 + 1/a)°**+*2)/2-1 


+ e-P2(m,ng — 2) + P2(m — 2, na), 


(16) P,(2) = 


where 
m 1 + (1/a) 
- ~ "TW ttt We: 


The formulas for the initial values are found to be 


ss Palen) = AG + 
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and 


1 {Tz,[(n2/2), (ns/2)] 
(20) P&2, 2) = —— {2 + ©-P(2, 2% — 2)), 
3 l+ec \ 1 + (1/a)}"?”? 2 3 
where 22 and z are given by (12) and (14), respectively. 
2.4 Approximate formulas. We now derive simpler approximate formulas. 
We first consider P . Writing Fy = Fay.n,(a1), Fs = Fra.ng(a)) F's = Pag.mtma 


(a3), we have 
P, = Pr {V2/Vi S F, and V;/V 2 F3}. 


Asn, — © both V,; — o; and V — oj and, in the limit, the two ratios V2/Vi 
and V;/V are independently distributed. It is therefore suggested that for large 
nm we use the approximation 


P, = Pr {V2/V; S Fi} Pr {V3/V = F3} 


(21) 


= [1 — I.,(3m1 , me) 2,(3(m + me), 3s), 
where 


ian v(i + 1 fia), 


Xz = (m + n)/ (n + m+ (ne6a + m)(1 — Heo), 
62 O32 £( ars) 
and 2z(a), z(a;) are respectively the roots, z, of I.(3m.,3m2) = a and 
I(3(m + nz), 3n2) = as. Here we have used the well-known relation between 
the incomplete Beta function J,(a, b) and the F-integral, viz, 


Pr{F,,.., S Fo} = [.(3n , v2), with c = »Fo/(v2 + Fo). 


We have also used the approximation that for large n; , V is approximately dis- 
tributed as (moi + no2)xn,4nz/(mi + Mm)’. 
Next we turn to 


P, = Pr {V2/V; 2 F, and V;/V2 2 F2}. 


Here we could use a similar argument if we were to let ne ~. This limit would 
however, not yield useful results. The important situation in pooling procedures, 
is one in which nz is moderate or small. Instead we use the well-known normal 
approximation to log V;. M.S. Bartlett and D. G. Kendall [3] have shown that 
log V; is approximately N (log o; , 2/(n; — 1)), provided that n; is not too small. 
Writing 

u = log V2 — log V; and z = log V3; — log v2, 


it follows that the joint distribution of u and z is approximately bivariate normal 
with correlation coefficient 


- pm -U{(14 Bort B=). 


) 
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We may therefore employ the tables of the double probability integral of a 
bivariate normal surface of K. Pearson [16]. Tables VIII and IX. If z and y follow 
a bivariate normal distribution with both means equal to 0, correlation coefficient 
p, and both standard deviations equal to unity, then these tables give the prob- 
abilities P;(h, k), for z = h and y = k. In our case, p is given by (22) and h and 
k by 

— 2Zng.n,(a1) — log bx eee 2znz.nx(@2) — log Os. 
[2/(m — 1) + 2/(m — 1)}’ (2/(mz — 1) + 2/(ns — DP’ 

where Z,,,.;(@) is the upper 100a per cent point of Fisher’s z distribution with 
numerator degrees of freedom n; and denominator degrees of freedom n, .° 

2.5 Theory of reduction of mixed model to random model. Certain mixed models 
of analysis of variance were described in Section 1.3. No new formulas are re- 
quired for these models, as we shall show that the joint distribution of the three 
mean squares is, at least approximately, equal to that of the component of 
variance model. The exact specifications of the distribution for the mixed model 
being considered are as follows. (Primed parameters will be used to specify the 
parameters for the mixed model.) 

(a) The error mean square V;, and the doubtful error mean square V; are dis- 
tributed as xio;/n; (i = 1, 2), where x} is the central x’ statistic with n; degrees 
of freedom. On the other hand, the treatment mean square V; is distributed as 


2 2. ° ° e 
X'ng72/N8 , where San is the noncentral x’ statistic with n; degrees of freedom 
and noncentrality parameter 





d— aed af 
203 2 
where 633 = 03/02. Vi, V2, and V; are independent. 
(b) The main purpose of the analysis is to test the hypothesis Ho: 03 = 02 
against the alternative Hi:03 > 03. 
(c) The true error mean square, V2, has an expectation o; which is greater 
than or equal to the expectation, oj , of the doubtful error mean square. 


The probability P of rejecting Ho is obtained as the sum of the two com- 
ponents, 


A= CA o 1), 


(23) P, = Pr {V2/V; = Fasn,(ax) and V3/V2 = Fag.n,(a2)} 
and 
(24) P, = Pr {V2/Vi < Fag.n,(ar) and V3/V = Pasns+ng(ara)}e 


In evaluating these probabilities we use, the approximation first used by 
e 2 e 2 ° ° 

Patnaik [13]. We replace x, by Cx;, , x;,; being the central x’ statistic based 
upon v3 degrees of freedom, where 

4y” 
ns + 4d 

* See Appendix Table 4 for illustrations of the nature of the approximation to the inte- 
gral P,. 


df 


/ 
v3 = N3 + 
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TABLE 4 


Modified parameters for random model corresponding to specified parameters for 
mixed model 


Specified ——— for mixed Modified parameters for random model 


, 
uy 


‘ 
ne 


4? 
-—— 
Na - 4 


, 
Ns 

, 
an = a 

, Ue ee " , 
ae 2 Root of Fn3.no(a2) = Fvzng (ae) 

’ - eo ¥ “ , , 
a3 Root of F n3.ny+n2(a3) = Fy3,n,+n9 (as) 
6, \ 2 ie 2 , 

a1 = 02/01 2 621 

, 2,2 , , 

632 = o3/02 30 = (2A + nsz)/nz 


~ 


ms Fn + On’ 
Since the use of this approximation reduces the noncentral x° statistic to a central 
x statistic, all three statistics are now central. 

We now compare the power for the mixed model as defined by (23) and (24) 
with the corresponding formulas (8) and (9), for modified values of the eight 
parameters as indicated in Table 4. Entering the random model tables with 
these altered parameters we obtain the mixed model power. It will be seen that 
when we deal with the size for the mixed model we have \ = 0 and hence »; = 
n;, so that all primed parameters agree with those without primes. Thus our 
entire size discussion to follow is directly applicable to the mixed model. On 
the other hand, the power evaluations, which refer to ag = a; = .05, will in gen- 
eral provide answers for larger values of az and a3, and these levels a2 and a; 
will vary with \. For a proper evaluation of power corresponding to a given pair 
of significance levels as and a; , say, as = a3 = .05, a more extensive tabulation 
of (8) and (9) as described in the subsequent section, would be required. 

2.6 Application of derived formulas. The recurrence formulas derived in Sec- 
tion 2.3.2 were used to construct master tables of P; and P, . These master tables 
were constructed for 

5 = 5, 3 = 3(1) 10, and = 1(1) 6. 
Also, tables were constructed for 4ng = 3, in order that the effect of small error 
degrees of freedom could be better studied. However, the latter tables were con- 
fined to the values 


in; = 1 and 4m= 4,7, and 10. 
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To compute the power component P; in a given problem from these master 
tables, for specified degrees of freedom m , n2 , and n; and levels of significance 
a , a2, the values of the parameters uj and us are computed from (10). From 
these values and those specified for 62 and 62 , the corresponding values of a and 
d from (11) and hence the value of x2 from (12) are computed. P; is then obtained 
by interpolation in the appropriate master table. 

The procedure used to compute the component P, was similar, and required 
evaluation of an interpolation for the parameters a, b, c, , x; and x2 ; but interpola- 
tion with respect to a was avoided by choosing values of 6 which would result 
in tabular values of a. This accounts for the decimal values of 6, found in our 
tables. 

The approximate formula (21) for P, derived in Section 2.5 is exact for n, = 
and was found to be very effective for large m , yielding either P; values directly 
to sufficient accuracy, or facilitating extrapolation of the master tables. 


3. Discussion of power and size curves and comparison of test procedures. 
3.1 Type of recommendations attempted. We have seen that the power of our 
test procedures depends upon the following eight parameters: the degrees of 
freedom n; , n,, and ns ; the variance ratios 0 = 02/0, , 0:2. = 03/03 ; and the 
levels of significance a , a2 , a; . Of these, the degrees of freedom 7 , mn. , and ng 
are completely determined by the analysis of variance table, while the variance 
ratios are generally unknown (except in the case of the size of the procedure, 
when 63. = 1). Any recommendations that are to be made must therefore be con- 
fined to the levels of significance, a , a2 , and a; . We shall here be primarily con- 
cerned with the size of the procedure being in the vicinity of .05. It will be ap- 
parent from what follows that a convenient way to achieve this is to choose a; = 
a; = .05, that is, to choose procedures in which the significance levels of both 
final tests are .05. However, the remaining parameter, a , the level of significance 
for the preliminary test, is entirely at our disposal. In attempting recommenda- 
tions, therefore, we shall be concerned with the choice of the level of a . Should 
a, be, say, .05, .25, .50, or should we use what Paull ([14], p. 4; [15]) has called 
the borderline test, where a; will be near .70 to .80? In choosing the level of a , 
we shall consider 
(i) the variation in the size of our test procedure as a function of the param- 
eter 6 , and 

(ii) a comparison of the power of our test procedure with that of the never- 
pool test of the same size. 

3.2 Size. The size of our test procedure does not equal the nominal level of 
.05, but varies about this level as a and 61 vary. Figures 1 to 10" give us examples 
of size curves, illustrating the variations in type one error with variation in 6% 
‘for fixed values of the remaining parameters. 


7 A selection of figures and tables has been assembled in the Appendix. Additional size 
and power curves and tables, illustrating the points to be made in the ensuing dis- 
cussion will be found in [8]. 
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Note that as 6. becomes large, the size approaches .05; for, as 61 — «, the 
preliminary test will almost certainly be significant, pooling will almost 
certainly not occur, and hence the final test will almost certainly be that of 
V;/V2, having a size of .05. 

At the lower extreme, that is, at @, = 1, the size is at its minimum, which is 
less than .05. This minimum, and even more so the size peak, are points of 
particular interest. 

We first consider the size peak. Referring to the size curves for a preliminary 
test carried out at the 5% level (see Figs. 1 and 2), we note that the peak is 
usually very high. Clearly, a preliminary test carried out at this level will in 
many cases admit an unacceptable size disturbance. This is due to the fact that 
at this level, the preliminary test will frequently admit pooling V2 and V; when 
co; is smaller than the true error mean square o3, and thereby increase the 
probability of type one error. We therefore seek a preliminary test in which 
pooling is admitted less readily; we next investigate the level a, = .25. At this 
level (compare Figs. 2 and 3), size control is considerably better, and in many 


(1) ne? 6 
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Fic. 3. Size curves for nz = 2, nz = 10, a; = .25, ag = a3 = .05 
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Fig. 5. Size curves for n; = 6, nz = 10, a; = .25, a2 = a; = .05 
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Fig. 6. Size curves for nz; = 6, ne = 16, a1 = .25, ag = a; = 05 


cases the peaks do not go beyond .08. (See, e.g., Figs. 3, 4, 5, and 6.) It is ob- 
served that, in general, the size peak increases as m or mz increases or as nz de- 
creases. (See Figs. 1 through 6.) 

It is of course quite arbitrary to specify any rules for maintaining an ac- 
ceptable upper tolerance for the size peak, since what is considered acceptable 
is a matter of opinion. In using a nominal size of .05, if we stipulate that our 
size peak should not go much beyond 10 per cent, then we find that even with 
the 25 per cent preliminary level, there are situations in which this upper limit 
is exceeded. Generally speaking, these unacceptable size peaks occur when 


(25) m2n. and nm = 5m. 


(It should be noted that the occurrence of n; > nz is clearly rare.) This means 
that when the treatment degrees of freedom are greater than or equal to the 
error degrees of freedom, we must be careful if at the same time the doubtful 
error degrees of freedom are greater than or equal to five times the true error 
degrees of freedom; or, briefly, we must be careful when pooling promises a 
large gain in the precision of the error estimate. This rule has been established 
by an empirical study of an extensive number of size curves, and is not based on 
any analytic study. See, for example, Fig. 7. Here the situation represented by 
Curve 1 would be excluded by our rule. See also Figs. 8 and 9, in which the 
situations represented by Curves 1 and 2 would be excluded. If our rule is fol- 
lowed, size disturbances such as are represented in Figs. 3, 4, 5, and 6 occur; and 
also disturbances such as are represented in Figs. 7, 8, and 9 occur, with the 
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Fia. 7. Size curves for ns = 2,7; = ©, a; = .25, a2 = ag = .05 


exception of the excluded cases mentioned above. In the situations represented 
by (25), a more conservative level of a, would be appropriate. From a study of 
a number of size curves it appears that a preliminary test at the 50 per cent 
level will ensure adequate control of the size peak in these cases. (See Fig. 9.) 

Not only the size peak, but also the size minimum is affected by the level of 
the preliminary test. From theorems proved by Paull ({14], Chap. 4; [15]), we 
know that the size of our test procedures is a minimum with respect to 6: at 
6, equal to one, and that a lower bound for the size for this value of 6; is 


These lower bounds are .0475, .0375, and .025 for a; = .05, .25, and .50, respec- 
tively. For some of our curves the plotted minimum sizes are situated very 
close to these lower bounds. For the borderline test, where, as proved by Paull 
({14], [15]) the size is always less than .05, this lower bound varies in magnitude 
from approximately .01 to approximately .015. We have computed actual 
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Fig. 8. Size curves for nz = 6, nz = 6, a; = .25, a2 = a, = 05 


minimum size values for the borderline test for selected values of n; , nz, and 
n; . For small m2 and n, , these are very close to their lower bounds, irrespective 
of n,. A person using this test should therefore remember that he may be using 
a test which has a considerably lower size than .05. The actual disturbance is of 
course small, but the proportional disturbance is considerable. However, since 
the borderline test size disturbance is a reduction rather than an increase in size 
and is therefore on the conservative side, we are not attempting to make any 
definite rules as to when the experimenter should avoid the use of this test, but 
merely to remind him that large proportional size disturbances occur when nz 
and n; are both small (36). 

Summarizing our considerations of size control, therefore, we have narrowed 
down our recommendable range of a; to a, 2 .25, with the reservations that in 
certain cases characterized by inequalities (25), a, = .25 would not be desira- 
ble, as it would admit too large a peak in the size curve; and that for very small 
values of n_ and n; the experimenter may not wish to use the borderline test, 
as this would admit too low a size minimum. 

The discussion thus far has been concerned with test procedures in which 





HELEN BOZIVICH, T. A. BANCROFT AND H. O. HARTLEY 


(1)d,=.25,n, 20 
(2)0,+.25,n,* 60 
(3) da, s.25, ny =20 
(4) d, =.50, n, = 00 


Fia. 9. Size curves for ny = 12, ng = 10, ag = ay = .05 


a, = a; = .05. A few special cases for a, = a; = .01 and a, = .25 have also 
been investigated. In all these situations, larger proportional size disturbances 
than those found for a2 = a; = .05 were experienced, even for cases which our 
rule would accept. (See Fig. 10.) 

3.3 Frequency of pooling. We have been discussing the effect of increasing a in 
order to achieve size control. It is obvious that for a; = 1, our preliminary F 
per cent point would be zero, and pooling would never occur. The question 
arises as to the relative frequency of pooling for the intermediate values of a 
that we have been considering. When @., = 1, the probability that V2/V; exceeds 
F.,(m2,™m) is am , so that pooling occurs with relative frequency 1 — a. As 
6 increases, this frequency rapidly decreases, approaching the limit zero as 
6 becomes infinite. Evaluations of these frequencies of pooling show that, 
while for a; = .25 and small values of 6 , pooling will occur in the majority of 
experiments, when a, = .50 the frequency is usually well below .50. This fre- 
quency of pooling is of course even smaller for the borderline test, where a; 
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Fia. 10. Size curves for ny = 12, n; = ©, a2 = a; = 01 


usually takes on values in the neighborhood of .7 to .8; when such large values 
of a are employed, pooling occurs in only about 25 per cent of all situations 
for which 6., = 1, and this pooling percentage rapidly decreases as 6, increases. 
While this property by itself cannot be regarded as a disadvantage of the bor- 
derline test, it is clear that, if this test were the only one recommended to the 
experimenter, he would hardly ever pool. 

3.4 Power. We now attempt a comparison of the power of our sometimes-pool 
procedure with that of the never-pool test. As is well known, any comparison of 
power of any two test procedures is a fair comparison if the two test procedures 
have the same size. We have seen that the size of our sometimes-pool procedures 
is not at the constant level of .05, but varies about this, depending upon the 
parameter 6 . The method of power comparison we have therefore adopted is 
as follows: 

(i) Assume a fixed value of the parameter 6 . 
(ii) For this value of 6m , evaluate the size of the sometimes-pool test. 

(iii) For this level of size, evaluate the power curve of the never-pool test; 
this power is then directly comparable with that of the sometimes-pool test 
corresponding to the chosen value of 6x . 

For an illustration of such comparisons, see Table 1. Here we have m = 20, 
Ne = 6, ns = 2, a = .25 and a = a; = .05. For 61 = 1, the sometimes-pool 
procedure is always more powerful than the never-pooi test of the same size. 
For 62; = 1.5, the powers are very similar; on the other hand, for 6, = 2, the 
never-pool test is always more powerful. See also Tables 2 and 3, which again 
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Fic. 11. Power gain of the sometimes-pool procedure over the never-pool test of the 
same size for n,; = 20, n; = 2, ne = 6, a1 = a2 = a3 = 05. 
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Fig. 12. Power gain of the sometimes-pool procedure over the never-pool test of the 
same size for n; = 20,n; = 2,n2=6,F, =~ 2F.@,a2 = a; = .05. 


illustrate the fact that the sometimes-pool procedure is more powerful for small 
6. but less powerful for large 6: . 

In order to show more clearly the dependence of these power differences on 
6 , we have plotted in Figs. 11, 12 and 13 the difference between two corre- 
sponding power points against 6. . Here each curve corresponds to a fixed value 
of 63:—i.e., that value of 6. at which the difference between the power ordinates 
of the power curves was taken. It will be seen, again, that for small 6. the dif- 
ferences are positive (the sometimes-pool procedure is more powerful than the 
never-pool test), while for larger 6 the position is reversed. As 6, — ©, the 
difference tends to 0, since both procedures tend to the never-pool test at the 
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Fic. 13. Power gain of the sometimes-pool procedure over the never-pool test of the 
same size for n; = 20, nn; = 2, n= 6, a= 25, a2= a; = 05. 


.05 level of significance. The transition from favorable to unfavorable power 


conditions generally occurs between 62; = 1.5 and 62, = 2.0. From other similar 
curves not shown here, it is seen that the magnitude of these power gains and 
losses increases with increasing n; , or decreasing nz , or increasing ™ . 

Figures 11, 12, and 13 also illustrate the effect of decreasing the per cent point 
of F for the preliminary test. In these figures, corresponding power comparisons 
are given, respectively, for a: = .05, Fy = 2F so(n2, m),° and a = .25. There 
is a general tendency for both power gains and power losses to diminish as the 
per cent point of F decreases—i.e., as a increases from values such as .05 through 
intermediate values such as .25 to the level of the borderline test (approximately 
a = .70 to .80). Here the gain in power has diminished further, but the power 
losses have completely disappeared. In fact, a theorem by Paull ({14], p. 61; 
[15]) proves that the borderline test is always more powerful than the corre- 
sponding never-pool test of the same size, although the power gain is small for 
large 6 . However, as stated in Section 3.2, this size is below the nominal level 
of .05. If we compare the borderline test power with that of the never-pool test 
at the nominal level of .05, the former is always less powerful. It is likewise less 
powerful than our sometimes-pool procedures for a; = .50 and a = .25, which 
have, of course, a larger size. 

We now attempt recommendations, considering the relative merits of the 
proeedures at a; = .25, a, = .50, and a = .7 to .8 (the borderline level). These 
recommendations are somewhat subjective, since they are contingent upon what 


8 This means we use a preliminary test V2/V; = 2 F.w(n2, 1). 
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the experimenter may regard as a reasonable assumption concerning the param- 
eter Oxy ° 

(i) If the experimenter is reasonably certain that only small values of 6 can 
be envisaged as a possibility, he is advised to use a, = .25 except in the cases 
(25) when he should use a = .50, in order to ensure size control. Our figures 
show that the range of small values of 6 , when the sometimes-pool procedure 
gives a gain in power, is approximately between 1 and 1.5 to 2. An experimenter 
about to adopt this recommendation but not quite certain about his assumptions 
may wish to know the consequences which result from his adopting this pro- 
cedure when, in fact, unknown to him, 62; is large. It is seen from Figs. 1 to 10 
that in such a situation he will still have control of the size of his test; in fact 
the size will be near .05 for large 6 . All he loses (as is illustrated by our power 
figures and tables) is the power of his test; this is a risk that he may well be 
prepared to take. 

(ii) If, however, the experimenter can make no such assumption about 6. , 
and wishes to guard against the possibility of power losses, he may then use the 
borderline test, which would ensure a power gain, although he must realize 

(a) that for large 6. this gain would be very small; 

(b) that for small 62: he would use a test procedure of a very much smaller 

size than a, = .05 (particularly when n. and n; are $6) and accordingly 
a test which is much less powerful than the never-pool test of size .05. 
In fact, he may in these circumstances prefer not to pool at all. 

It may be correctly argued that, in order to control the size peak, to advocate 
a; = .50 in the cases characterized by (25), and a; = .25 otherwise, introduces 
an artificial discontinuity in our recommendations. It would be quite feasible 
(although it would require a considerable effort in computation) to evaluate for 
any given triplet m , m2, and n; that value of a which results in a size peak of 
0.10 exactly. Since this level of a, would depend on the degrees of freedom n, , 
MN , and nz, it would be necessary to evaluate the associated per cent points of 
F. For such recommendations to be useful, this table of F.,(m, m2) (which 
would be a large 3 parametric table with m , n., and n; as arguments) would 
have to be published. To encumber the experimenter with special tables for the 
preliminary F-test in addition to the standard F-tables for the final F-tests ap- 
peared to us to be unnecessary, and the use of the published Merrington and 
Thompson [11] 25% and 50% points of F preferable. 

We should note here that a rule favored by Paull ({14], Chap. 6; [15]) ad- 
vocating testing the ratio V2/V; against 27 s0(n2 , m1) will not ensure adequate 
control of the size peak, since 2F' 59 > F 2s in general, and we have just seen that 
F + is sometimes too large and hence not always acceptable as a significance level 
for the preliminary test. Aleo, it would appear to us that no rule of the form 
V2/V: > constant is very satisfactory, for with such a rule the frequency with 
which pooling occurs, as well as the size, varies considerably with the degrees 
of freedom m and n . 

Concerning recommendation ii, the experimenter would require knowledge of 
the precise level of a for the borderline test, or, better still, the value of F as- 
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sociated with it. Paull ({14], p. 20; [15]) gives a simple formula from which the 
following is derived: F point for borderline test equals 


(nF n3,n1-+03(Os) iB 
(ny + Ne) (F ng.nz) (a2) - NaF n5.014+03(C3) ’ 


where F,,,.n2(a2) represents the 100 a2 per cent point of F with numerator df n; 
and denominator df n.. Similar statements can be made for the other symbols. 

It has been noted that the above recommendations depend upon some a priori 
information regarding 6 . It is shown in a number of examples discussed in the 
Wright-Patterson report how this information can often be obtained from the 
general conditions under which the experiments were carried out. 


APPENDIX 
FIGURES AND TABLES 


TABLE 1 


The power of the sometimes-pool procedure and the never-pool test of the same size, 
for n, = 20, m = 6, ng = 2, a = .25, a2 = a3 = .05 
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TABLE 2 


The power of the sometimes-pool procedure and the never-pool test of the same size, 
for m = 20, ne = 10, ns = 12, a = .25, a2 = a3 = .05 
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TABLE 3 
The power of the sometimes-pool procedure and the never-pool test of the same size. 
for n= 14, m= 10, m= 12, a, = .20, a =a; = .05 
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TABLE 4. 
Illustrating the nature of the approximation to the integral P; (ny = 20 throughout) 


on s nesenasisasesi egies sna 
Exact | Approx. | iff. Exact | Approx. | Diff. 
0375 | .0375 | . .0375 | .0375 | 0000 
0574 | .0530 | . .0484 | .0441 | .0043 
.0621 | .0528 | .0093 | .0444 | .0364 | .0080 

0524 | .0396 | .0128 | .0274 | .0190 | .0084 | 

.0288 | .0187 | . 0088 | .0050 | .0038 | . 


.0375 | .0375 | . ri Fe .0000 

| |1.5 | .0692 | .0623 | . | 4 |. .0067 

bs = 6 0 | .0932 | .0757 | .0175 |. i .0112 
\ 





.0856 | .0618 | .0238 | j .0122 | .0086 





i 
For exact integral P; , see Eq. (15). 
For approximate integral P,; , see Section 2.4 and Formula (22). 


0463 | .0305 | .0158 |. .0016 | .0018 
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ON A CLASS OF STOCHASTIC APPROXIMATION PROCESSES! 


By D. L. BuRKHOLDER? 
University of North Carolina 


1. Summary. We are concerned with the asymptotic behavior of stochastic 
approximation processes of the Robbins-Monro type [9], the Kiefer-Wolfo- 
witz type [7], and related types. Our main interest is establishing the asymp- 
totic normality, under appropriate conditions, of processes of certain kinds. 
However, a number of results on convergence with probability one are also 
obtained as immediate consequences of a theorem needed in the work on asymp- 
totic normality. Our results contain and extend some of the results in this area 
reported by Blum [1], Chung [3], Hodges and Lehmann [5], and others. In 
addition, we establish a number of results for cases not previously investi- 
gated. For instance, we show that the Kiefer-Wolfowitz processes are asymptoti- 
cally normal under quite general conditions. The rapidity of convergence de- 
pends on the amount that the function M (of Corollary 3.2) departs from 
symmetry in the neighborhood of the location of the maximum.’ We give results 
on convergence with probability one and asymptotic normality of stochastic 
approximation processes useful in connection with the problem of finding the 
location of the point of inflection of a function. For all cases in which we es- 
tablish asymptotic normality we also show how the unknown quantities in the 
variance of the limiting normal distribution can be estimated. These results 
make possible the construction of asymptotic confidence intervals free of un- 
knowns. Other results, too detailed to be summarized here, are established 
which also might be of interest in practical applications. 

Our method of procedure is as follows. We define a class of stochastic ap- 
proximation processes, denoted by A», which contains the class of Robbins- 
Monro processes, the class of Kiefer-Wolfowitz processes, and some other related 
classes. We first study the class A» using methods similar to those used in [1], 
[3], and [5]. After various results at this level have been obtained, the results 
for the special cases follow with an economy of effort. 


2. Introduction. Let N denote the set of natural numbers and R the set of 
real numbers. If F is a distribution function and n is in N, let F™! denote the 
distribution function of the arithmetic mean of a mutually independent random 
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variable set of size n, each element of which has the distribution function F. Let 
M be a function from RF into R. For each z in R let Y(x) be a random variable 
with distribution function H(- | z), such that EY(z) = M(x). Let {a,} bea 
positive number sequence, let {r,} be a natural number sequence, and let a 
be in R. Let x, be a random variable, and if n is in N let 


Zatti = Za — An(Yn ae @), 


where y, is a random variable with conditional distribution function H'”’ 
(- | an), given %,°*-,2%n, Yr, °** , Yat. The random variable sequence {z,} 
will be called a stochastic approximation process of the type A, . 

This type of process (with r, = r; for all n in N) was first developed and used 
by Robbins and Monro [9]. They proved that if there is a real number 6 such 
that (x — 6)(M(z) — a) > O for all real z ¥ @ and if several other conditions 
are satisfied, then {z,} converges to @ in the mean, which implies, of course, 
that {z,} converges to @ in probability. Wolfowitz [12] proved convergence in 
probability under less restrictive conditions. Later, Blum [1], Kallianpur [6], 
and Kiefer and Wolfowitz (see footnote 2 of [1}) obtained independently and 
under somewhat different sets of conditions that {z,} converges to @ with prob- 
ability one. Schmetterer [10, 11] and Kallianpur [6] have reported results on the 
order of magnitude of E(x, — 6) under various conditions. Chung [3] has shown 
that under certain conditions the moments of a,*(Zn — 6) converge to the 
moments of a normal distribution. This implies, of course, that n*(2n — 6)is 
asymptotically normal. Hodges and Lehmann [5] have relaxed Chung’s set of 
conditions and have obtained asymptotic normality—at the sacrifice, however, 
of information about the moments of a,*(z, — 8). 

Let M, Y(-) and H(- | -) be as before, let each of {a,} and {c,} be a positive 
number sequence, and let {r,} be a natural number sequence. Let x, be a random 
variable, and if n is in N let 


Latt = Xa — (a',/¢n)(Yon—1 — Ym), 


where en—1 and ye, are random variables which are conditionally independently 
distributed according to H'""'(- |z, — c,) and H""'(- | z, + c,), respectively, 
given 21, °** Xn, Yi, °** » Yen-2. The random variable sequence {z,} will be 
called a stochastic approximation process of the type As. 

This type of process (with r, = 7; for all n in N) was first developed and used 
by Kiefer and Wolfowitz [7]. They proved that if there is a real number @ such 
that M is increasing for x < @ and decreasing for z > 6 and if several other con- 
ditions are satisfied, then {z,} converges to @ in probability. Later, Blum [1] 
proved the stronger result that {z,} converges to @ with probability one under 
less restrictive conditions. 

Let M, Y(-), H(-|-), fan}, {en}, and {r,} be as before. Let z; be a random 
variable, and if n is in N let 


Tn+i = In -- (an/cr) [Ysn—1 — (Ysn—2 + Ysn)/2), 
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where Ysn—2 , Ysn—1 , ANd Yn are random variables which are conditionally mutually 
independently distributed according to H tral ( lan — Cn), H Ural (. |z,), and 
H""'(- | an + €n), respectively, given 21, ---,2n, Y1,°** » Yan-s- The random 
variable sequence {z,} will be called a stochastic approximation process of the 
type A; . This type of process is useful in connection with the problem of finding 
the location of the point of inflection of a function, as will be seen later. 

It is convenient to study first a type of process slightly more general than any 
of those defined above. Hence we make the following definition: For each n in 
N, let R, be a function from R into R. For each n, x in N X R, let Z,(x) bea 
random variable with distribution function G,(- | z) such that EZ,(x) = R,(z). 
Let {a,} be a positive number sequence. Let x; be a random variable, and if n 
is in N let 


Tn+1 = In — An2n ’ 


where z, is a random variable with conditional distribution function G,(- | z,), 
given %1,°**, Zn, 2,°°**, 2n-1- The random variable sequence {z,} will be 
called a stochastic approximation process of the type Ao. 

Without confusion, let A; denote the class of stochastic appreximation proc- 
esses of the type A;, 7 = 0, 1, 2, 3. It is easy to see that each of A; , Az, and 
A; is a subclass of Ag. 

Throughout this paper, unless otherwise indicated, a passage to the limit will 
be for n — o. Also, if n is in N, then V, will denote the function defined by 
V,(z) = var Z,(zx) for x in R, and V will denote the function defined by V(z) = 
var Y(z) for z in R. 


3. Some results on convergence with probability one. The following theorem 
is needed in connection with our study of the asymptotic normality of stochastic 
approximation processes of the type Ao . However, it also has several immediate 
applications which we will discuss briefly in this section. 

THEOREM 1. Suppose {x,} is a stochastic approximation process of the type Ao 
and 6 is a real number such that 


(i) there is a function Q from the positive real numbers into N such that if « > 0, 
|\2 — 6| > c, and n > Q(e), then (x — 0)R,(z) > 0; 
(ii) supa,zl| Ra(x) |/(1 + | 2|)] < ~; 
(ili) supn,zVn(z) < ©; 
(iv) if0 <i <& < ©, then 3) anf inf | R.(z) |] = ~; 
1 


|z—@| S52 


(v) Dian < ~; 

(vi) if n is in N, then R,, and V, are Borel measurable. 
Then P {lim z, = 6} = 1. 

We note that condition (vi) implies that if X is a random variable, then each 
of R,(X) and V,(X) is a random variable. Moreover, V,(X) is integrable, using 
(iii). 

The proof of Theorem 1 is omitted, since the methods used are similar to those 
used by Blum [1] in the proof of his Theorem 1. 
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Coro.uary 1.1. Suppose {x,} is a stochastic approximation process of the type 
Az and @ is a real number such that 


(i) M is increasing for x < @ and decreasing for x > 6; 

(ii) sup. [| M(x — h) — M(x +h) |\/(1 + |2])] < © for some number h > 0; 

(iii) V is a bounded, Borel measurable function; 

(iv) ff 0 < 6 < & < @, then inf (| M(t — €-) — M(x + 6 |\/2) > Ofori S 
|jr—0|58&8,0<€<&; 

(v) cn 0, Dan = ©, 29 (an/en)’ < @. 

Then P {lim z, = 6} = 1. 

Proor. Let G,(- | z) = Ural. | x), where F,,(- | x) is the distribution function 
of the sum of two independent random variables, one having the same distribu- 
tion function as Y(x — c,), the other having the same distribution function as 
—Y(x + c,). Let Z,(x) be a random variable with distribution function G,(- | x). 
Let Rn(z) = M(x — cn) — M(x + cn), Zn = Yon-1 — Yon, and a, = a,/c, . Thus, 
clearly, {x,} is in Ao. 

Conditions (i) and (ii) of the corollary imply that for each positive number 
k sup[| M(x — «) — M(x + 6) |/(1 + |2|)] < @ forO < € < kandz in RX. 
This, in turn, implies that condition (ii) of Theorem 1 is satisfied here. 

The reader can easily verify for himself that the other conditions of Theorem 
1 are satisfied here, and hence that the desired result holds. 

Corollary 1.1 is relevant in connection with the problem of obtaining informa- 
tion about the location @ of the maximum of a regression function M. The above 
set of conditions is less restrictive than Blum’s set of conditions for this case 
(see Theorem 2 of [1]) in several respects. For instance, regression functions 
which are parabolic, e.g., M(z) = —z2’, and regression functions with bounded 
value sets, e.g., M(x) = 7. are allowed here but are excluded by Blum’s set 
of conditions. 

Sometimes information about M(@) is desired. The following corollary shows 
how it can be obtained by using the same data used in the approximation of @. 

Coro.uary 1.1.1. Suppose the conditions of corollary 1.1 hold. If, in addition, 
M is continuous at 6, then P {lim > j_1(y;/n) = M(6)} = 1. 

Proor. The proof consists of an application of the following special case of 
Loéve’s Theorem A [8]. 

Lemma 1. If {v,} is a random variable sequence such that Zz (Ev,’/n*) < @, then 
Diaalvj — E(vj|v, +++ , vj2))/n — 0 with probability one. 

Here let vont = Yona — M(an — Cn) and von = You — M(2, + Ca). Then, 
Evin-a = EV(2_ — Cn)/rn and Evs, = EV(2_, + ¢a)/ta. By condition (iii) of 
Corollary 1.1, it follows that >> (Ev’,/n’) < ©. It is easily proved that E(v2,—: | 
Vi, ++, Vere) = E(von| 01, +*** , Vent) = 0. Thus, >> 5.1 v2/n — 0 with prob- 
ability one. By Corollary 1.1, P{lim (x, + c,) = 6} = 1. Thus, since M is con- 
tinuous at 6, P{lim M(x, + c,) = M(6)} = 1. These'facts imply the desired 
result. 

We now consider the implications of Theorem 1 for stochastic approximation 
processes of the type A; . Under certain conditions such processes may be used 
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to obtain information about the points of inflection of M. However, the results 
given here will not be stated for the general situation but will be given in terms 
of the special case of the approximation of the mode of a probability density 
function. The results for the more general situation follow from Theorem 1 in 
much the same way. 

Throughout the paper, if 0 < p S 1, let B(- | p) be the distribution function 
such that B(y | p) = Oify <0;1 — pif0 S y < ljand1lify 2 1. 

Coro.tuary 1.2. Suppose {x,} ts a stochastic approximation process of the 
type A; such that M is a distribution function with associated density function f, 
H(- |x) = B(- | M(2)) for all real x, 6 is a real number, and 


(i) f is increasing for x < @ and decreasing for x > 8; 

(ii) 7 0 < 6 < & < ©, then inf(| f(z — 6) — f(x + 6 |/—) > Ofori S 
jz—- 7(|856,0<€< bh; 

(iii) cn > 0, Dan = ©, > (a,/c?)? < @. 


Then P{lim xz, = 6} = 1. ‘ 

Proor. Let G,(- | x) = Fiyel(. | x), where F,,(- | x) is the distribution function 
of the sum of three mutually independent random variables, the first having the 
same distribution function as Y(z), the second having the same distribution func- 
tion as —Y(x — c,)/2, and the third having the same distribution function 
as —Y(z, + c,)/2. Let Z,(xz) be a random variable with distribution function 
G,(- | x). Let Ri(z) = M(x) — [M(x — cn) + M(x + en)]/2, 2n = Yoru — 
(Ysn-2 + Ysn)/2, and a, = a,/c? . Thus, {z,} is in Ao. The remainder of the proof 
is straightforward. 

Coro.uary 1.2.1. Suppose the conditions of Corollary 1.2 hold. In addition, 
suppose f is continuous on an open interval containing @ and >, (1/ne,)’ < ~. 
Then P{lim Do %-i{(ys; — ysi-2)/2cm] = f(@)} = 1. 

The proof is similar to the proof of Corollary 1.1.1. 

It is not hard to see that Theorem 1 also implies a result on the stochastic 
approximation of the root of a regression equation. In particular, Theorem 1 
implies Blum’s [1] Theorem 1 which deals with this case. 

We note in passing that our Theorem 1 can be generalized to the case where 
6 does not exist uniquely. One implication is that if {z,} is a stochastic approxi- 
mation process of the type A; such that M is a distribution function, H(- | xz) = 
B(- | M(zx)) for all real 2,0 <a <1,>0> a2 < © and} >a, = ©, then there is 
a random variable xo such that P{lim x, = 2, 0: S % S 62} = 1, where @, = 
sup {z| M(x) < a} and 6 = inf {x| M(x) > a}. Thus, information about the 
quantiles of a distribution function can be obtained by using stochastic ap- 
proximation methods, even though some of the quantiles may not exist uniquely. 
The details are given in [2]. 


4. Asymptotic normality of stochastic approximation processes of the type Ao . 
Lemma 2. Suppose {b,} is a nonnegative number sequence and each of {c,} and 
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{d,} is a real number sequence such that 
bay S bal ~ cot o(1)] 4 dm + 0(1) , O(1) 





ner neti? 


Bayi S [1 a “| 4 dy + O11); 


nN nPri nei ? 








where lim inf c, = ¢c > p > 0, lind, = d 2 0,0 < r < 1, and p(l — r) < g. 
Then lim nb, S d/(c — p). 

This lemma is related to Chung’s Lemma 1 [3]. The proof uses Chung’s lemma 
and an induction argument. 

Lemma 3. Let {x,} be a random variable sequence, and if n is in N let each of 
f,, and g,, be a Borel measurable function from R into R. Suppose \f,} 1s continuously 
convergent at the number 6 to the number B (that is, if {y,} is a real number sequence 
with limit @, then fn(yn) — 8B) and is uniformly bounded. Suppose that if n is in N, 
then Eg,(x,) exists (finitely), and if 5 > 0, then 


E{| gn(ta) ||| 2%. — 0| 2 8} Pilz, — 0| 


Then E\fn(XLn)Gn(Ln) } _ BEg,(Xn) _ o(1) B| Yu(Xn) |. 

Proor. Let K = sup,,z | fn(z) |. By assumption, K is in R&. Also, the assuinp- 
tions imply that F{f,(2n)gn(2n)} exists for all n. Let « > 0. Since {f,} is con- 
tinuously convergent at 6 to 8, there isa é > O and an min N such that if | x — 6| 
< dandn > m, then | f,(x) — 8| < e. Thus, forn > m 


E\(fnltn) — B)gn(tn)} | S E{\fn(z) — B| | Galan) |} 
< cE {| ga(zn) ||| 2. — 0| < 8} P{|a, — 0| < 3} 


= 6} = o(1)E | ga(z,) |. 


+ (K +|8|) Ef| gn(an) ||| — 0} = 6} P{lz.— 0,28 
le + o(1)] B | g,(2,) |. 


' 
5 


lA 


The desired result is implied. 

We. now give several results on the asymptotic normality of stochastic ap- 
proximation processes of the type A» . Although Theorem 2 has implications for 
the special cases, it is here used only as a step toward Theorem 3. The two 
theorems will be compared in more detail later. 

The methods used in proving Theorem 2 are similar in many respects to the 
methods used by Chung [3] in his study of the class of Robbins-Monro processes. 
The main modifications are due chiefly to the fact that usually where Chung is 
working with a single function, it is necessary here to work with a function se- 
quence. The assumption that there is a real number sequence {x,} converging 
to 6 such that x — yu, and R,(x) have the same sign for x # u, removes many of 
the difficulties involved in this approach. 

THEOREM 2. Suppose {x,} is a stochastic approximation process of the type Ao, 
{un} 7s areal number sequence, {c,} is a positive number sequence, 0 is a real number, 
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and each of 8,7, T, V, V, 0”, y, , c, and d is a positive number such that 


(i) Ru(un) = 0 for all n; 
(ii) the function sequence {T,,}, where for each nin N, T(x) = Ra(x)/Cn(2 — un) 
ff x # un, = Bifx = yu, , is continuously convergent at 6 to B and satisfies T S 
T,(x) < T for all n, x; 
(iii) {V.} ts continuously convergent at @ to o° and satisfies V < V,(z) $ V 
for all n, x; 
(iv) of ris in N, then supy,z E | Z,(2) — R(x) |" < @; 
(v) G.(y | -) ts Borel measurable for all n, y; 
(vi) un — 6 = O(n"), 0 < ES 4,E <7, n'*a, ce, nance, > d > £/T; 
(vii) all moments of x; exist. 


Then, if ris in N, 
22 r/2 
lim n" E(x, — 0)" = F ae . | (r — 1)(r — 3) --- 3-1, if r ts even, 
2pd — 2 
= 0, if r is odd, 


which implies that n‘(x, — 6) is asymptotically normal (0, o°°/2(Bd — €)). 

Proor. If nis in N, let b& = E(x, — 0)’, if r is a nonnegative integer, 8S” = 
E|z, — 0|', ifr = 0, and b, = b&. Without going into detail we remark that 
these expectations and the other expectations written below exist. The necessary 
integrability of certain functions is assured by boundedness conditions such as 
(iv), together with the measurability condition (v). That each of R,(x) and 
E | Z,(xz) — R,(z) |’ is Borel measurable in z, for instance, can be seen by con- 
sidering Stieltjes approximating sums to the integrals involved, R,(z) = f2.. y 
dG,(y | z), and so forth. 

If r is in N, then 


(1) bits = 1 + Oo (7) (an), m, 


where H,(r, n) = E[(z, — 0)" “*z']. By conditions (i) and (ii), Ra(z) = ¢aT'.(z) 
(x — pa) for all n, x. If r is in N, then 


Hy(r, n) = E{E[(z, — 0)" zn | tal} 
(2) = E{(zn — 0)" 'Ra(zn)] 


= Cn E[Ta(n)(tn — 8)"] + Cn(O — on) E[T n(tn)(2n — 8)". 
Thus, if r is even, then 
3) Hi(r, n) = cnTbS? — en | 0 — un | FBS? 


(4) cnxTO” — ie. | - an Its | To}, 


Also, since 288” < pY~” + gY for r = 2, we have by (3) and the relation 
| @ — wa| = o(1), which is implied by (vi), that if r is even, then 


(5) Har, n) = {end + 0(¢n)|0S” — cn | @ — tm | TOs” /2. 
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If risin N andr = 2, then 

(6) Ha(r, n) = E[Va(an)(tn — 8)" "| + CrE[T n(an)(tn — 0)" "(tn — bn)')- 
Since (z, — un)’ S 2(t, — 0)’ + 2(6 — un)’, we obtain 

(7) | Ha(r, n) | = (V+ o(1)]8x” + 2c, 7%Bs”. 

Also, it is easily shown that if each of r and k is in N and r 2 k, then 

(8) | Hi(r, n) | = O(1)eS” + O(1)c%,8%”. 


We shall now prove that (I) if r is a positive number, then there is a positive 
number B, such that 


(9) lim n*s°” < B,. 
Since [g{”]'” is nondecreasing in r for r > 0, it suffices to show that (9) holds 
for each even r. 
Consider the case r = 2. Using (5) and (7) in (1) gives 
9 nit? 
Dnt Ss b, [: a 2ndncnT z +2 + nV z o(1) O(1) 


n ier ogre 


Using (4) and (7) in (1) gives 


2ndn Cr 
Bast * bn [ were: eee 


T+ ow) 4’ nai, V + o(1) rn O(1)b4, 


n nei nrti * 


The assumptions of the theorem and the above relations imply, by Lemma 2, 
that lim nb, < ¢°V/2(2'd — £). Thus, (9) holds for r = 2. 

Suppose that r is even, r > 2, and that (9) holds for each even natural number 
less than r. Then, of course, 8% = O(n~**) for each positive number k < r — 2. 
Using this fact and (8) gives 


(10) = (;) (—a,)*Hi(r, n) = o(n™*) + o(n™")o”. 


k=3 


Using the induction hypothesis and substituting (5), (7), and (10) in (1) and 
(4), (7), and (10) in (1) gives two systems of inequalities which, upon applying 
Lemma 2, yield the relation lim n™*b{” < (r — 1)c’°VB,./2(Td — &). By in- 
duction, (9) holds for each even r. Thus, (I) is proved. 

Next, it will be shown that (II) if 6 > 0 and r is in N, then 


E{\ 2, — ||| an — @| = 5}P{| 2, — @| = 5} = o(1)B%”. 
lf r = 0,5 > 0, g > O, then, by (1), 
(11) y "lla, — 6| = 5}P{\z, — 0| = 3} 
< gal gtet2alb) = o(n™*) 


Since ag: — E(2n41 | nx) = —Gnl2n — Rn(Xa)], we have that E((zas1 — 6} | tn) 
= E((an41 = E(2n+41 | tn)} | Zn) -_ an E((2n = Ra(tn)} | Ln) = an Vn(2n) = a,V. 
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Thus, (n + 1)”b,4. = n’a,V = c’'V + o(1), where \ = & + 4. Therefore, if 
r = 2, then 

(12) lim inf n™gS” = lim inf (nb,)"" = (°V)"”. 


Furthermore, n”s{” = n™E{| x, — 6 \*| lan — 0| < 1}P{|z, — 0| <1} = 
nb, + 0(1), using (11). Thus, 


(13) lim inf n™s® = eV. 

Relations (11), (12), and (13) imply the assertion (II). 

Using (ii), (vi), (I), and (II) in (2) gives, by Lemma 3, that if ris in N, then 
(14) Hi(r, n) = en8b8? + c,0(n™*). 
Similarly, using (iii), (vi), (I), and (IT) in (6) gives that if ris in N andr 2 2, 
then 
(15) H.(r, n) = obo + o(n-*), 


With r = 1, using (14) in (1) gives bS92, = bf (1 — nanc,8/n) + o(n**), 
implying that there is a natural number mn such that if n > m then | wo .16 
|b | (1 — nanen8/n) + o(n**). Chung’s Lemma 1 implies here that lim n‘ 
| 6S? | = 0. Thus, lim n*b = 0. 

With r = 2, using (14) and (15) in (1) gives 
- ansnsel) 4 wiMane + o(1) 


n nett 


Da4t = b[ 1 


Applications of Chung’s Lemmas 1 and 2 here yield the fact that lim nb, = 
oc /2(8d — £). 

A simple inductive argument shows that, in general, if r is in N, then lim 
nb” exists and is equal to the quantity specified in the conclusion of the 
theorem. Thus, the theorem is proved. 

THEOREM 3. Suppose {x,} is a stochastic approximation process of the type 
Ao, {un} ts a real number sequence, {c,} is a positive number sequence, @ is a real 
number, and each of no, B, o, &, Y, &, ¢, and d is a positive number such that con- 
ditions (ii) and (iv) of Theorem 1 and (v) of Theorem 2 are satisfied and 


(i) ifn > no and x ¥ un, then (x — un) R,(x) > Oand R, (un) = 0; 
(ii) {7,}, defined as in Theorem 2, is continuously convergent at @ to B; 
(iii) {V,} is uniformly bounded and is continuously convergent at 6 to o’; 
(iv) ifrisin N, thensup E| Z,(x) — R,(x) |" < © for|x—0|< e,n>m, 
(v) un —-0=O(n"),0 <ES4R,E <y, n'*4g,, —c, na,c, ~d > §/B. 


Then n‘(x, — 6) is asymptotically normal (0, o'c’/2 (Bd — £)). 

The conclusion of Theorem 3 is not as strong, of course, as the conclusion of 
Theorem 2. On the other hand, the set of conditions of Theorem 3 is considerably 
less restrictive. Conditions (ii) and (iv) of Theorem 1 are much weaker restric- 
tions on {R,,} then the assumptions involving T and T of Theorem 2. In Theorem 
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3, assumptions about the moments of G,(- | z) higher than the second are made 
only for x in a neighborhood of 6. No assumptions about the moments of the 
random variable x; are made, and so forth. 

In the light of the above, the special cases will be considered here in terms of 
Theorem 3 only. 

The truncation device used in the proof was introduced by Hodges and Leh- 
mann [5] in their work on the class of Robbins-Monro processes. With it they 
were able to relax a condition of Chung’s Theorem 9 [3] analogous to the one in- 
volving 7 in condition (ii) of Theorem 2. 

Proor. Let T, 7, V, and V be positive numbers satisfying §/d < T < B < 
T and V <o° < V. By assumptions (ii) and (iii), there is a positive number < ¢€ 
and a natural number mn; > no such that if « isin = [@ — 6,0+ 6] andn > n, 
then T < T,(x) < TandV < V,(x) < V. Let y be in R and let « > 0. The 
conditions of Theorem 1 are satisfied here. Thus, P{lim x, = 6} = 1, which 
implies there is a natural number m > m such that 


(16) > {SUPn | Zmin vr 6 | S 5} = l ™. 4, 


For each n,zin N X R, let Z'.(x) = Z,(x),ifxisinI; Z,(@ — 6) — R,(@ — 6) + 
Bc,,(« — un) if x is not in J; let G’,(- | x) be the distribution function of Z/,(x) and 
let R',(x) = EZ',(x). Let x; be the random variable such that 2} = tm41 if 2m41 is 
in 7; 0, otherwise. For each n in N, let 2,41 = %n — Gmin2n , Where z, isa random 
variable with conditional distribution function Gi4.(- | 21), given t1, °°: , 2a, 

1, °**, Za-1 and is such that P{z, = zm4n|A} = 1, where A = {z},--- ,25, 

’ “4 Zn—1 = Tn4t >, °°* » Umtn sy Surly °° 9 Sn4a~2}. 

From (16) and the definition of {x}, we have that if n is in N, then Pix, # 
Lmin} < €, implying that 


| P{(m + n)*(x, — 0) S y} — P{(m+ n)(anin — 0) Sy} | <e. 


It is clear that {z,} is a stochastic approximation process of the type A» and 
satisfies the conditions of Theorem 2. Therefore, n‘(xz, — 6) and, consequently, 
(m + n)*(x, — 0), is asymptotically normal (0, o°c’/2(8d — £)). Denoting this 
normal distribution by D, we have that there is an nz in N such that ifn > nm 
then | P{(m + n)*(x, — 0) < y} — D(y)| <eand, hence, | P{(m + n)*(2min — 0) 
< y} — D(y)\| < 2e. The desired result is implied. 

Tueorem 4. Suppose the conditions of Theorem 3 hold. Then n'R,,(x,) is asymp- 
totically normal (0, o°8°d’/2(8 d — £)). 

Proor. For all n > no, n'Ra(tn) = n*(an — 0)T n(tn)n* *cy + 0(1)T a(n). By 
Cramér’s theorem ({4], p. 254), the desired conclusion follows from the condi- 
tions and conclusion of Theorem 3. 


5. Asymptotic confidence intervals free of unknowns. We note that the 
quantities o* and 8, each of which would probably be unknown in most practical 
situations, appear in the variance of each of the asymptotic distributions given 
above. In order to be able to construct sequences of estimates of these quantities 
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of the sort that would make it possible to obtain asymptotic confidence intervals 
for 6 free of unknowns, we proceed as follows: 

With each stochastic approximation process of the type Ao we associate a 
random variable sequence {Z,} and a positive number sequence {z,} such that 
for each n in N, Z, is a random variable, conditionally independent of z, , with 
conditional distribution function G,(- | 2%, + mn), given a1, --- 
Zn-1 - 


» In, Z1,°**y 


Also, to each {x,} in A; we associate a random variable sequence {%,} and 
a positive number sequence {z,} in an analogous manner, 7 = 1, 2, 3. For in- 
stance, if {x,} is in Az, then for each n in N, Jn. and G2, are random variables 
which are conditionally independently distributed according to H'™(- | z, — 
Cx + ma) and H'" (+ | 2, +c, + 2,), respectively, and which are conditionally 
independent of yon: and Yon, given %1,°** ,2n, Yr, °** » Yone- 

THroreM 5. Suppose the conditions of Theorem 3 hold. Then s, = > jn12;/ 
n— o° with prbability one. If, in addition, x, — 0 and 1/x, = O(n") where 
0 < & < &,thent, = iat [(2; — z;)/ejrm| — B with probability one, which implies 
that n* | 2t,d@ — 2&|* (a, — 0)/cs, is asymptotically normal (0, 1). 

Thus, asymptotic confidence intervals for 6 free of unknowns can be constructed. 

Proor. The last remark follows, by Cramér’s theorem ((4], p. 254), from the 
conclusion of Theorem 3 and the fact that c’s;,/2(t,d — £) — c’o’/2 (Bd — £) 
with probability one. (Convergence in probability is, of course, enough.) 

We will first show that (I) if, in addition to the above conditions, the conditions 
of Theorem 2 hold, then s*, — o” with probability one and (x, — @)/z, > 0 with 
probability one. 

Let vn = zn — V(tn) — Ri(x,). Then v, < 3 [z. + Vian) + Ri(z,)] and 
Ev’, = 0(1), where the latter fact is obtained by using the results of calculations 
made in the proof of Theorem 2. Hence, >> (Ev:,/n’) < ©. Also,E(v,|01, °°, 
v»-1) = 0 for all n. Thus, by Lemma 1, >>}: v;/n 0 with probability one. Since 
P{lim z, = 0} = 1, Vala) + Ri(z,) — o° with probability one, implying that 
P{lim s*, = o°} = |, “& 

Let r > 1/(€ — &). Then by Tchebycheff’s inequality and relation (9) we 
have that P{|z, — 6|/r, = «} < BS’ /x’e = O(n”) for each ¢ > 0. By 
the Borel-Cantelli lemma, the latter part of (I) follows. 

Using (I) and employing the truncation device used in the proof of Theorem 
3, it is easily shown that (II) under the conditions of Theorem 5, s, — o” with 
probability one and (z, — 6)/x, — 0 with probability one. 

An application of Lemma 1 gives that t, — > jai((R (x; + 2j) — Rj(x;)]/c;rn) 
— 0 with probability one. For n > no, [Ra(ta + tn) — Ra(tn)\/Cntn = [Ta(Zn + 
tn) — Tn(Xn)\(tn — 8+ O — pn)/tn + Tr(2n + 2p). The first term of the right- 
hand side converges to 0 with probability one and the second term converges to 
8 with probability one, using (II), the fact that (@ — u,)/r, = O(n-7**) where 
f&) < & < y, and condition (ii) of Theorem 3. This completes the proof. 


6. Approximation of a root of a regression equation. We will now sketch some 
of the implications of the above results for processes of the type Ai, Az, and A;. 
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Coro.uary 3.1. Suppose {x,} is a stochastic approximation process of the type 
A, , 0 %s a real number and each of a1 , o , & , ¢, and ro is a positive number such 
that 


(i) if x # 6, then (x — 0)(M(xz) — a) > 0; 

(ii) M is differentiable at 6 and M'(@) = a ; 

(iii) sup, [| M(x) |/(1 + |z|)] < @; 

(iv) if0 < 6 < & < @ theninf | M(x) —a| > Ofori S |r — 6; 

(v) V is bounded and is continuous at 0, and V(@) = o’; 

(vi) if ris in N, then sup E| Y(x) — M(x) |’ < © for|z— 0! < @; 
(vii) H(y | -) is Borel measurable for each y in R; 
(viii) na, >~¢ > 1/2m,1%. > To. 


IIA 
2 


Then n'(z, — 6) is asymptotically normal (0, o°c’/ro(2e,c¢ — 1)). 

The first results of the above kind were obtained by Chung [3]. Later, Hodges 
and Lehmann [5] modified a result of Chung’s and obtained a weaker conclusion, 
asymptotic normality without knowledge of moments, under weaker conditions. 
Corollary 3.1 is essentially the same as the result of Hodges and Lehmann. Minor 
differences are as follows: here the possibility that lim infj.;.. | M(x) — a| = Ois 
allowed, and assumptions about the moments of H(- | z) higher than the second 
are made only for z in a neighborhood of 6. The main reason for stating Corol- 
lary 3.1 here is for later use. 

The proof of Corollary 3.1 consists of showing that {z,} is in Ao and that the 
conditions of Theorem 3 are satisfied. Except for establishing that condition 
(vii) of the corollary implies that condition (v) of Theorem 2 is satisfied, it is 
trivial. The exceptional part is taken care of at once by the following lemma. 

Lemma 4. For each x in R let Y;(x), --- , Ym(x) be mutually independent random 
variables where H ;(- | x) ts the distribution function of Y ;(x), 7 = 1, +++ , m, and 
G(- | x) is the distribution function of >> 71 Y (x). Suppose that if y is in R, then 
H j(y | -) is Borel measurable, 7 = 1,--- , m. Then for each y in R, G(y| -) is 
Borel measurable. 

The method of proof is as follows. For each n in N express 


Gly + 1/n|z) — Gy — n|2), 


where 2G(v|z) = G(v — |x) + G(v + | 2) for v in R, in terms of Lévy’s in- 
version formula for characteristic functions. Using Stieltjes approximating 
sums and the assumptiens of the lemma, it is not hard to show that 


Gy + 1/n| -) — Gy — n| -) 
is Borel measurable. Thus, since 
Gy + 1/n|-) - Gy —n|-) > Gy] -), 


G(y | -) is Borel measurable. 
Coroutary 4.1. Under the conditions of Corollary 3.1, n*(M(a,) — a) is 
asymptotically normal (0, o'aic’/ro(2arc — 1)). 
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Coro.itary 5.1. Suppose the conditions of Corollary 3.1 hold. Then s, = 
ToD _jat (y; — a) /n—o° with probability one. If, in addition, r,—0 and 
l/r, = O(n®), where 0 < & < 3, thent, = 2 wk (9; — ys)/xm| > a with 
probability one, which implies that n'rh | 2tc — 1 \*(2, — 0)/(c8n) ts asympto- 
tically normal (0, 1). 

Of course, under all the conditions of the above corollary, n'y | 2t,c — 1 i 
(M(x,) — a)/es, | t, | is asymptotically normal (0, 1), also. The results for 
M(x,) — a may be of some interest in practical applications. 


7. Approximation of the location of the maximum of a regression function. 
If @ is a real number let IW» be the set such that M is in MW if and only if MV 
is a function from R into R such that either (i) M is increasing for x S 6 and 
decreasing for x > @, or (ii) M is increasing for x < 6 and decreasing for x = @. 

If M is in Me, let uw = « be the function from the positive numbers into 
R such that if « > 0 then |z — u(e)|[ M(x — ©) — M(x + ©)] > O for all x + 
u(e). 

Lemma 5. Jf M is in Me, then the funtion yw exists and \u(e) — 6) S e€ for 
each e > 0. 

Proor. Suppose M satisfies (i) of the definition of I». (The other case is 
handled similarly.) Then M(x — «) — M(x + e) is negative for x S 6 — «, 
positive for x > @ + e, and increasing in x for x in (@ — ¢, 6 + «|. This implies 
the desired result. 

If M is in MN, then we will say that M is n-locally-even at 6 if and only if 7 = O 
and ule) — 8 O(e'*") as € > 0. 


By Lemma 5, if .V is in MW , then is at least 0-locally-even at 6. Lemma 6, 


below, indicates that each function in a fairly large subset of SM is 1-locally- 
even at 6. If (6 — «) = M(@+ e) for each positive « < 6 then u(e) = 0 
for 0 < « < 6, implying that is 7-locally-even at 6 for each » = 0. How 
locally-even M is at @ is an important factor affecting the asymptotic behavior 
of type Az processes. 

Lemma 6. If M is in Me and 6 is a positive number such that the first three deriva- 
tives of M exist on I = [0 — 5, 0 + 8], M”(0) ¥ 0, and either M® is continu- 
ous on I or is of bounded variation on I, then M is 1-locally-even at 6. 

Proor. In either case M(x) = M(@) + M”(6)(x — 0)°/2 + O(|x — @)°) 
as x— 96. Therefore, for0 < « < 6/2,0 = M(u(e) — €) — M(u(e) + €) = 
—2M" (0) « (ule) — 6) + O(c’) as « +0. The desired result follows. 

Coro.uary 3.2. Suppose |x,} is a stochastic approximation process of the type 
Az, 678 a real number, and each of az, a, eo, £,c, d, and ro is a positive number 
such that conditions (ii) and (iv) of Corollary 1.1 are satisfied and 


(i) M is in Me, the first two derivatives of M exist on an open interval con- 
taining 0, M” is continuous at 0, and M”(@) = —az ; 
(ii) V is bounded and is continuous at 0, and V(@) = a’; 
(iii) if r is in N, then sup E | Y(x) — M(x) \" < © for |x — @| 
(iv) H(y | -) is Borel measurable for each y in R; 
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(v) M is y-locally-even at 6, and 0 < & < 4 — [1/(4 + 2»), n®*4ah/ce, > 
C, Na, > d > £/2a2, 1%, > 10. 


Then n'(x, — 0) is asymptotically normal (0, o°c’/ro(2a,d — £)). 
Here the yu, of Theorem 3 is, of course, u(cn). 
Coro.uary 3.2.1. Under the conditions of Corollary 3.2, 


lim P{2n*[M(0) — M(x,)) S zarK*} = [ (xt te" at 
0 


mT y 2 
for each positive number x, where K® = o°c*/ro(2a,d — £). 
Proor. The assumptions imply that 


M(x) = M(0) — a(x — 6)°/2 + o(\ 2 — 01) 
as x — 6. Thus, 
2n*[M(0) — M(x)|/a2K’ = [n'(x — 6) / Kl’ + o(1)n*(2 — 0)’. 


The desired result follows easily from the conclusion of Corollary 3.2. 

CoROLLARY 3.1.1. Suppose {x,} ts a stochastic approximation process of the 
type Az, 6 is a real number, and each of €, a1, 0°, c, and ro is a positive number 
such that condition (iv) of Corollary 3.2 holds, and 


(i) M is in Ie, M is differentiable at 6, — « and 6; + «, and 
M’(6,; — «-) — M’(@ + ©) = 2a, 


where 6, denotes yu(e); 
(ii} sup, [| M(z — e) —- M(za + «€)|/11 4+ |2])] < @; 
(iii) V is bounded and is continuous at 0, — ¢€ and 6; + «, and 


V(0, — «-) + V(Ai + ©) = 20°; 


(iv) there is an open real number set J containing 6; — « and 6; + e such that 
if r is in N, then sup E | Y(x) — M(x) |’ < @ for xin J; 
(v) na, > ¢ > 1/41, rn > ro, and if n is in N, then cy, = €. 


Then n'(t, — y(e)) is asymptotically normal (0, 20°c’/ro(4axc — 1)). 

Here the conclusion is given in terms of yu(e) rather than in terms of 6. Never- 
theless, the inequality | u(e) — @| S e holds. Also, if M(@ — «) = M(@+ 6), 
then u(e) = 6. The main point of the above result is that n‘ in the conclusion 
of Corollary 3.2 is replaced by n’ in the above conclusion. We remember that 
& < 4 in Corollary 3.2. It is desirable, of course, to have the exponent of n 
as large as possible. 

The class of processes satisfying the conditions of 3.1.1 is a subclass of Ai. 
Once this is observed, the above result follows easily from Corollary 3.1. 

We note that results analogous to Corollary 5.1 can easily be written down 
for the cases considered in Corollaries 3.2 and 3.1.1. 
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8. Approximation of the mode of a density function. Again, instead of con- 
sidering the general point-of-inflection problem, we limit ourselves to the special 
case of the approximation of the mode of a density function. Results for the 
more general situation can be obtained in the same way. 

If M is a distribution function with associated density function f and f is in 
Me , let Aw = A be the function from the positive numbers into R such that if 
e > O, then [x — A(e)] [2M(xz) — M(x — €) — M(x + ©] > O for all c ¥ Xe). 

Lemma 7. If M and f are as above, then the function d exists and|d(e) — 6| S 
for each e > 0. 

Proor. Let M.(x) = Ji! f(t) dt for all z, «. Then for each « > 0, M, is in 
Muse) ° Also, 


2M (xz) — M(x — «-) — M(x + © = Miplz — €/2) — Min(z + €/2). 


Let A(e) = ume,,(€/2). The desired result follows by using Lemma 5. 

Lemma 8. If M and f are as above and, in addition, there is a 5 > O such that 
the first three derivatives of f exist on I = [0 — 5, 6 + 6], f”(0) + 0 and either 
{® is continuous on I or is of bounded variation on I, then \(e) — 0 = O(e) 
ase— 0. 

The proof is similar to that of Lemma 6. 

CoroLuaryY 3.3. Suppose {x,} is a stochastic approximation process of the 
type A;, where M is a distribution function with associated density function f 
and H(- |x) = B(- | M(z)) for x in R, @ is a real number, n is a nonnegative 
number, and each of az, t, c, d, and ro is a positive number such that condition 
(ii) of Corollary 1.2 ts satisfied, and 


(i) f is in Me , the first two derivatives of f exist on an open interval I containing 
6, f” is continuous on I, and f”(@) = —a;3; 

(ii) A(e) — 6 = O(™") as e 30,0 <E <4 —1/(3 + 2), n'*an/c > c, 
na, —-d> 2t/a3, Tn — To. 


Then n‘ (x, — 0) is asymptotically normal (0, 30°c’ / 2ro(asd — 2£)), where o* = 
M(6)(1 — M(@)}. 

The asymptotic distribution of 2n™[f(@) — f(z,)] can be derived using methods 
similar to those used in proving Corollary 3.2.1. Also, a corollary to Theorem 5 
for this case can be written down. 
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ON SEQUENTIAL DESIGNS FOR MAXIMIZING THE SUM 
OF n OBSERVATIONS! 


By R. N. Brant, 8. M. JoHnson, ano 8. KARLIN 


1. Introduction. An important simple type of sequential design problem is 
as follows: We have two binomial random variables, X and Y, having param- 
eters under the two hypotheses, H; and H, , given by 

oe 
(¢) A, Pp q 
(l-¢) Hz @q »p, 


where ¢ is the a priori probability that H, is true. We wish to maximize the 
sum of n observations. The procedure for selecting an X or Y observation at 
each stage, of course, takes account of all the previous history. 

A more realistic version of the design problem deals with the situation such 
that X and Y have parameters p and q, respectively, where an a priori distribu- 
tion F(p, g) is known. The problem holds interest for several reasons. It would 
appear to be one of the simplest problems in the sequential design of an ex- 
periment that can be posed; hence its analysis is a step towards obtaining a 
body of information relative to specific sequential design problems. It has not 
only this general interest but also, as it stands, it has applications in particular 
problems such as learning theory, biology, and medicine; see [1], for instance, 
in which applications in the latter two fields may be found. A discussion of 
problems of this general variety and of certain strategies has been published 
by Robbins [2]. More immediately, in the final section of this paper it is shown 
that the solution to the problem in which p has a priori distribution F and q 
is assumed known, explicitly obtained in Section 4, yields directly the solution 
of a problem in industrial inspection. 

The type of problem known as the ““Two-armed Bandit” is a special case of 
the preceding. In its ‘classical’ formulation (whence the name), we have a slot 
machine with two arms, an X-arm and a Y-arm. When either arm is pulled, 
the machine pays off either one unit or nothing; and the probability of winning 
with one arm is p, and, with the other, g. A priori it is unknown which is which, 
but the probability ¢ that it is the X-arm which has probability p of success is 
assumed known. One is allowed n plays, and a sequential design, or strategy, is 
desired which will maximize the expected winnings. 

We shall use here for intuitive concreteness the gambling interpretation and 
terminology. 

It has been conjectured for this problem that the optimal strategy is S, : on 

Received Sept. 28, 1954. 
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each play choose the arm having, at that time, the maximum expected proba- 
bility of paying off, i.e., play each time as though there were but one play re- 
maining. This conjecture has been verified to hold for n S 8. 

The “‘Two-armed Bandit’’ problem can be generalized in two directions. The 
random variables may have distributions other than binomial. Sufficient condi- 
tions that S, be optimal are given and it is shown that, for the binomial case 
following S; , the expected winnings per play tend to max[p, q] asn > ©. 

The other direction of generalization was the problem as originally introduced: 
X and Y are binomial with parameters p and qg having a priori distribution 
F(p, q). In Section 3 it is shown that several properties which one intuitively 
expects the optimum strategy to possess are not, in general, characteristics of 
the optimal strategy—e.g., for p and qg independently distributed, if only n is 
sufficiently large, S; is not optimal; the optimal strategy may not stay on a 
winner; and the expected gain on the r-th play is not necessarily a nondecreas- 
ing function of r. Also, S; , the strategy which maximizes the expected winnings 
over the next k plays, is not always an improvement over S;_1 . 

In Section 4, the parameter qg and the a priori distribution F(p) are assumed 
known. In this case the optimal strategy is determined explicitly and is shown 
to have those intuitive properties which have been previously noted not to be 
general characteristics of optimal strategies. These results are applied, in the 
final section, to obtain the optimal procedure in a certain industrial inspection 
problem. 


2. The “Two-armed Bandit.” 2.1. The statistical problem which goes under 
this general title is that of finding a design which will maximize the sum of n 
independent observations in the following situation: Let X and Y be real- 
valued random variables having cdf’s F; and G; , respectively, under hypothesis 
H(t = 1, 2) and ¢ be the a priori probability that H; is the true hypothesis. 
The problem is to devise a sequential design which will maximize the expected 
value of the sum of n observations, each of which is to be an observation either 
of X or of Y. 

Let f; and g; be the densities corresponding to F; and G; with respect to the 
measure y. Let W,(¢, S*) denote the expected value of the sum of the n ob- 
servations if ¢ is the a priori probability for H; and the design, S*, is used. If one 
observed X first and then continued for n — 1 steps following the optimal rule 
S*, then the expected sum would be 


A= f nOdta-9[ Hoa 
(2.1.1) c 


1“ child) * 
+ [ Wr (Gora ,S ) (chil) + (1 — Ofe(t)] dy. 


Similarly, if Y were observed first and the optimal rule followed for the re- 
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maining n — 1 steps, the expected sum would be 


Be=sf[ w@ay+a-9 | wa 


(2.1.2) 
e Fgx(t) ' 
+ [Wea ( iO, 8*) tas) + 1 — sono] a 
Le Nig + — Hey”) ae 
Hence, W,(¢, S*) = max(A,, B,). 

A natural design to be considered is that which requires that one maximize 
step by step—i.e., after the j-th observation, the a posteriori probability, ¢; , 
is computed; and at the next step, the random variable corresponding to the 
maximum of 


[testo + 0 - sancol ag 


a esgr(t) + (1 — gga(t)] dy 


is observed. Denote this stepwise maximization design by S; . 
THEOREM 2.1. If the likelihood ratios f./f, and g2/g: have the same distributions 
under H, and also under Hz , then S, ts the optimal design. 
Proor. Since 
i 
1 — ¢ fr(x) 


cg fi(x) 


if X is observed first, 


if Y is observed first, 
f gi(z) 


and the likelihood ratios have the same distributions, the distribution of ¢ is 
independent of which random variable is observed first. Hence, the expected 
value of the optimal yield from the last (n — 1) steps is independent of the 
choice for the first step. One can, therefore, maximize the expected sum of n 
observations by choosing at the first step the random variable having the larger 
expected value and continuing with the optimal design for the remaining steps. 
Since all the random variables are assumed to be independent, the same 
argument shows that, given ¢; , it is optimal to follow S, for the (j + 1)-st step. 
An example in which the likelihood ratios are distributed alike is: 


X r 
A, N(O, 1) N(y, 1) 
H, N(u, 1) N(O, 1) (u > 0,» > 0), 
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with » = v. However, it can be shown that for n = 2 and (1 — £) uw = fv, S; 
is optimal only if u = »v. 

2.2. A special case of the ““Two-armed Bandit’’ of widespread interest (the 
“classical” case) is that in which the random variables have binomial distribu- 
tions with parameters given by: 


Hy, 
H; q Pp. 


A second example in which the likelihood ratios are distributed alike is fur- 
nished here if p + q = 1. Hence, for that case, S, is the optimal design. Indeed, 
it is a conjecture that for any choice of p and q, S; is optimal; it has been verified 
to be for n S 8. Optimal or not, S, has the desirable property of being con- 
sistent; i.e., Theorem 2.2 holds. 

THEOREM 2.2. Following the design S, , the expected value of the average of the 
first n observations converges to max(p, g) asn —> ©. 

Proor. Assume p > q. Then 

[q+(p—gs  fors 2 3, 
(1) Wile, Si) = 4 

lp — (p — g)t for ¢ S 3; 
and if ¢ = 3, 


4 A = A S a a ps *) ‘ r= ) 
Walt, $1) = Wall, 8) + Wes (ph) Pel = 0 


+ We+( A 


(2) 


while if ¢ < 3, 


7 S = F 8; ) Pa __@ .) [ = 
Walk, Si) = Wat, 8) + Wea (poy) Py = 0 


> (1 — gf ; 
+ Woa((hh= 98) par =o, 
where P;(Z = c) = [P-(Z = c| Hi) + (1 — S)PHZ = c| A). 

W, is clearly convex, symmetric about ¢ = 4, and continuous. So is W,, 
since by an inductive argument, W, is symmetric about 3, (2) and (3) are con- 
tinuous, and each (by formal differentiation twice) is convex. Also it is easily 
seen that W,(¢, Si) = n[(p — q)t + q] for ¢ near 1. 

Let a,(¢, S;) = 1/nW,(¢, Si). Then a, is convex, continuous, and bounded 
above by p on (0, 1]. Furthermore, lan(t, S:)| S p — q. As a consequence of a 
more general result below, Lemma 3.3, a, is nondecreasing in n. Hence, a(f, S;) = 
lim,+«0 @n(¢, S;) exists and is convex and continuous on [0, 1]. Moreover, since 


(3) 
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nan(¢, S;) satisfies (1) and (2), 


ps oar Wind ( (1 — p)t oe 
ig (ag ne 1) l)+a PAX a ‘,) P(X = 0) 


(4) a(g,S:) = 


$24, 


s Q ats gt DW an 
= 1) +a (Sp 2S) = 0) 


$ 
Suppose that the minimum of a(f, S,) is assumed at f = 4. Then it also 
assumes its minimum at pfo/P;,(X = 1) > fo. By iteration, it assumes its 
minimum at (p%o)/[p"to + g"(1 — {.)], which tends to 1 as n — o. Hence, 
fo could be taken to be 1. If, on the other hand, {> < 4, the analogous procedure 


shows that {> could be taken to be 0. Thus, the minimum of a(¢, S;) is assumed 
either at 0 or 1. But a(0, S:) = a(1, Si) = p, which establishes the theorem. 


3. A generalized “Two-armed Bandit.’” This section is concerned with the 
Bayes problem of maximizing the expected number of successes in n trials 
when at each trial we are free to choose between two binomial random vari- 
ables, X and Y, whose probabilities of success, p and q, respectively, are un- 
known, but a known a priori distribution, F(p, q), is specified. 

The special case where F(p, g) concentrates at the two fixed points (p, q) and 
(q, p) with probabilities ¢ and 1 — ¢, respectively, leads to the “‘classical’’ 
problem considered in Section 2.2. 

Let S denote a strategy for choosing between X and Y and W,]|[(p, q), S| de- 
note the expected number of successes in following S for n plays for given p 
and g. Then the expected number of successes is 


1 1 
(3.1) war, 8) = | | Wl, 8) aPC, 9). 


We will find it convenient sometimes to express this as W,(dF, S). The best 
strategy is the one maximizing W,(F, S). Since n is finite, the maximum exists. 

Example 1. Suppose p + gq = 1 with probability 1; i.e., F(p, q) is of the form 
F(p, 1 — p). In this case a success or failure of X is equivalent to (gives the 
same information as) a failure or success with Y, respectively. 

S; is optimal in this case; for let F:1(p, g) denote the a posteriori probability 
after (k — 1) plays and S,_, the optimal strategy for (n — k) plays. Then using 
X followed by S,_x yields 


[> dF yi + Woilp dF, Sax) / p dFy4 


+ Wea((l — p) dF ss, Ses) / (1 — p) dF, 


2This section represents an extension of some preliminary work by 8. Johnson and 
S. Karlin at The RAND Corporation. 
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and using Y followed by S,_x yields 


[a dFy4 + Wr(q dF x41, Sn—n) | q dF y1 


+ Wya((l = 9) dFra,Se1) [= 9) dPas. 


Since g = 1 — p, X is the optimal play if and only if fpdF.. = fqdFi; 
i.e., S; is optimal. This example is related to the result of Theorem 2.1, which 
embraces a special case of F(p, q) = F(p, 1 — p). 

3.1. Our first task is to obtain the complete strategy for n = 2 when the a 
priori distribution is F(p)G(q). It is important to notice that, if the number of 
trials is n, then only designs which are functions of the first n moments, mw, --- , 
un of F and wy, --- , un of G, need be considered. This is a consequence of the 
fact that the expected yield for any strategy is an expression involving at most 
these moments. Thus all strategies describing a first move can be expressed in 
terms of functions 7';(u:,--- , fn, #1, °°* » Mn) Such that if 


7 / 
T (ur, °** 5 Mey ot, *** wn) SO, 


then X is chosen at the first trial; otherwise Y is used first. 

Suppose for definiteness that yu: = ui ; we determine necessary and sufficient 
conditions that XY be used first when n = 2. Using the fact that on the last trial 
one chooses the random variable having greatest expected value, if X is used 


first the expected yield is 


(1) Mi + pa () + (1 — mw) max {ui — o, 


1— mw 


Since (1) = 2u: => wi: + wi = 2u1, X followed by optimal is better than Y fol- 
lowed always by X, which is better than Y followed always by Y. Of the other 
two strategies starting with Y, the one requiring X if Y = 1 has expected yield 
2ui + iui — ue, Which can be shown to be less than or equal to that for the 
strategy requiring Y if Y = 1, namely, 


(2) wi + ue + (1 — wi)m. 
Upon comparison, (2) S (1) if and only if 


« . é , , , , 
(3.1.1) either wo = we Or pw + mys = wi + me. 


Combining and rewriting in a symmetric form, we have 
. . , 
Lemma 3.1. If n = 2 and p and q are independent with moments yu; and yu; , 
then X is used on the first trial if and only if 


, , > , , , ’ 
max{ye2 — wii, wr — wi} 2 MAax{us — wy, wi — mi}. 


Our next theorem shows that in almost all circumstances of independent a 
priori distributions for p and q, the optimal design and S, cannot agree. 
THEOREM 3.1. Jf p and q are independent with a priori distributions F(p) = 
fe o(t) dt and G(q) = fi W(t) dt, where and y are continuous and positive for 
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0 < t < 1, then there exists an n such that for n trials, the optimal design does 
not agree with S; . 

Proor. (By contradiction.) Suppose for definiteness that 1 > fo to(t) dt = 
b= a= f} w(t) dt. According to S, it is clear that X is used first and, by the 
Schwarz inequality, that we do not change random variables if a success occurs. 

It is easily shown, in view of the hypothesis on ¢, that if r and s tend to in- 
finity so that r/(r + s) — ty, then 


i (1 — 0)’o(0) dt 
(1) = armen 


— bh. 


[ t'(1 — t)*¢(t) dt 


(This can also be obtained as a consequence of the law of large numbers where 
the relative frequency of success tends to fo .) Hence, taking t) > a + «€/2, we 
can choose r and s so that 


[ (1 — 2)*o(t) dt 


i ——— >a forz = 0 and 1. 
| tic, — 2)"g(t) dt 
0 


Furthermore, « may be chosen sufficiently small that also 


2 
| t'¢(t) dt > | / to(t) at| + 3e. 


Now let n = r + s + 2 and suppose that the first r plays resulted in suc- 
cesses with X and the next s plays were failures with X. This agrees with the 
procedure prescribed by S; and has positive probability of occurrence. There 
are now two plays left and, from (2), S; requires X on the next step. However, 
Lemma 3.1 gives necessary and sufficient conditions that the use of X is opti- 
mal and we show that these are violated. 

There are two steps left and the a posteriori probability distribution is 
F'(p)G(q) where 


p’(1 — p)*o(p) 


F'(p) 1 
| t'(1 — t)*¢(t) dt 
0 


and Gq) = [ " y(t) dt. 
0 


On account of (2), 


1 1 
/ p dF'(p) > | q¥(q) dq, 
“0 0 


and on direct calculation it is seen that both of the inequalities of (3.1.1) are 
violated. Hence, following S; , we arrive at a nonoptimal yield, and the theorem 
is established. 

3.2. S; may be described as that procedure requiring at each step the random 
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variable that would be optimal, were there but one trial remaining. In a similar 
spirit, let S; be the strategy which requires at each trial the random variable 
that would be optimal were there j trials remaining, with the understanding 
that if fewer than j trials remain, then the optimal procedure is followed. (The 
strategy S. for p and g independent is determined by the relations given in 
Lemma 3.1.) 

We have, thus, a sequence of strategies, S,, S:,---, S,. For a series of n 
trials, S, is the optimal strategy and hence W,(F, S,) 2 W,(F, S;) for all 
j <n. Intuitively one might expect that the W,(F, S;) are nondecreasing in 
Jj; i.e., the more steps ahead we take into account, the better the strategy. How- 
ever, it can be shown that there exist a priori distributions such that for n = 3, 
W;(F, Si) > W3(F, S.). The details are omitted. 

3.3. The next principle examined is that of “staying on a winner’’: Does the 
optimal strategy have the property that whenever a success occurs, the same 
random variable is required on the next trial? S, , for instance, has this property. 
However, it is not always a characteristic of an optimal strategy, as the follow- 
ing example shows. 

Suppose F(p, g) concentrates probability 0.8 on (0.1, 0) and 0.2 on (0.9, 1). 
It can be shown that for this example we must stay on a loser but switch from 
a winner for the case of n = 2. 

A property related to the intuitive notion of staying on a winner is that of 
“monotonicity,” which we discuss for p and g independent. Let S*(n, FG) 
denote the optimal strategy for n trials against a priori FG and let dF’ = 
pdF/{pdF and dF’ = (1 — p)dF/ f(1 — p)dF, with G’ and G’ similarly 
defined. S*(n, FG) will be termed monotone if: 

(i) S*(n, F’G) allows X first = S*(n, F’G) requires X first, 

(ii) S*(n, FG’) allows Y first > S*(n, FG") requires Y first, 
and 

(iii) S*(n, F’G) allows Y first = S*(n, FG") requires Y first. 

For instance, (i) is to be thought of as: if a prior “free” observation of X were 
allowed, then, if we might use X on the first trial, even if X had failed on the 
prior tria’, we should certainly use X on the first trial, if the prior trial had 
resulted in a success with X. 

Lemma 3.2. If for 1 S k S n — 1 and for all F and G, S*(k, FG) is mono- 
tone, then S*(n, FG) stays on a winner. 

The proof is omitted. 

With the aid of the results of Lemma 3.1, it can be verified directly that, for 
n = 1 and n = 2, the optimal strategy is monotone and, therefore, for n = 3 
and p and q independent, the optimal strategy stays on a winner. 

The general monotonicity property for independent parameters remains an 
open question. 

3.4. In using any strategy, S, let Z, = 1 if the random variable used on the 
r-th trial wins, and Z, = 0 otherwise; i.e., Z, is the contribution of the r-th 
trial. The last property considered is whether, for S* denoting the optimal S, 
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E{Z, | S*| is monotone increasing in r. We show first that E[Z, | S,] is non- 
decreasing in r, and second that E(Z, | S*] may decrease. 

Lemma 3.3. E[Z, | S,| is nondecreasing in r for every initial distribution F. 

Proor. It is enough to prove the result for the first two trials. Suppose 
Z, = X, then E[Z, | Si) = fp dF. But E[Z. | Si) = ElE[Z.| Z,, Si) = 
E[E[X | Z,, Sij] = E[Z, | Si). 

In contrast to this result, consider the case of n = 3, 

F(p) = pl — p)*/fp'(l — p)° dp, 

and G(q) = q. For optimal return, X should be employed first with expected 
return from the first trial of 0.6. If success results, then X is used again, while 


if failure occurs, then the criteria of Lemma 3.1 require Y. The a priori expected 
yield from the second trial is 64/110 < 66/110 = 0.6. 


4. The case of one known and one random probability of success. In this 
section we examine in detail the situation in which X has a binomial distribution 
with p = Pr(X = 1) unknown but selected by a known a priori distribution, 
F, while Y has a binomial distribution with known parameter, gq. 

The results of the preceding section were informative largely in a negative 
sense; there are many nice properties which optimal strategies do not possess. 
Many properties which seemed obvious but which were not in general enjoyed 
by optimal strategies in the general case, are held by the optimal strategy when 
one of the random variables has a known distribution. Hence, the rather de- 
tailed proofs in this section. 

We establish a series of lemmas describing some properties and the form of 
the optimal strategy and then obtain an explicit statement of it. 

Lemma 4.1. If Y is required at any trial according to an optimal strategy, then 
Y is required thereafter. 

Proor. First, it is easily seen that if at any trial Y is required, then the 
optimal choice for the next trial is independent of whether Y wins or not. 

Now suppose that at some trial, let us say at the first one, Y is required 
and is used r times, but that X is allowed on the (r + 1)-st trial. Then the 
expected winnings are 

»1 


»1 
(1) rq + Wa+alF’,q) | pdF + W,,.(F’,q) | (1 — p) dP, 


where F” and F’ are the a posteriori probabilities defined in Section 3.3, and 
W,(F, q) is the expected gain against a priori F in k trials pursuing an optimal 
strategy. But using X first followed by r trials of Y yields the same amount, 
contradicting the fact that Y was required on the first trial. Hence Y must be 
required throughout. 

As a consequence of Lemma 4.1, we can characterize an optimal strategy. 
We use the notation 


1 
i= I p dF. 
0 
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Lemma 4.2. There exists a function, Q, of n and F such that for n trials remaining 
and F the a priori distribution of p at that time, Y is required if and only if q > 
Q(n, F). 


Proor. From Lemma 4.1, Y is required if and only if 
(1) ong > wr + wr WaailF’, g) + (1 — m)Wr-1(F’, g) = nK,(F, 9). 


Now W,(F, g) = max {q, w:} and hence is nondecreasing convex in q for all F. 
By easy induction, W,(F, g) is nondecreasing convex in g for all F and n and, 
hence, sois K,(F, g). Since K,(F,0) = w > O and K,(F,1) = 1 — (1 — wy)/n 
< 1, it follows that for each n and F there is a point Q(n, F) such that q > 
Q(n, F) if and only if g > K,(F, q)—i.e., if and only if Y is required. 

We shall adopt the convention that if g = Q(n, F) we shall always use X, 
giving us a definite optimal strategy: 

If q > Q(n, F), use Y for all n trials. If q S Q(n, F), use X on the first trial 
and compute the a posteriori distribution of p, F’, and compare q and Q(n — 1, F’), 
following the above rules for choice at the second trial, etc. 

Having characterized the optimal strategy, we turn to a series of lemmas 
describing more precisely its form and properties. 

Lemma 4.3. For all F and n 2 2, Q(n, F) = Q(n — 1, F). 

Proor. Suppose the contrary. Then for Q(n, F) < q < Q(n — 1, F), Y would 
be required on the first trial and X on the second, contradicting Lemma 4.1. 

Lemma 4.4. For all F, q, and n, 


W.(F", q) = W.(F, q) = W.(F’, 9). 
PROOF. 
I: q = max{Q(n, F’), Q(n, F), Q(n, F’)}. 
Then ng = W,(F", q) = W,A(F, q) = W,AF’, 9). 
II: q < min{Q(n, F’), Q(n, F), Q(n, F*)}. 


We proceed by induction. The lemma holds for n = 1, since for all g and F, 


f an 
(1) max 44, us = max {q, m} = max {a = ba) 
. Mi 


1 — uJ 


In the case under consideration, 

(2) W,AF’, q) = 2 + 2 Wi-alF”, q) + (1 is *) Wi-alF”, q) = = + A. ’ 
M1 Mi Mi Mi 

(3) WCF, 9g) = m1 + mWralF’, g) + (1 — m)WialF’, 9) = un + Ba, 


WF’, q) = a a a — & W,-(F”, q) 
= mm ae 


(4) 
_ 41 — Pe Sf — “1— & 
+ (: 1 ro ") WiuAl(F , q) ; ~ o@ — + Cr . 
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By the induction hypothesis, 
Wia(F", q) = Wia(F’, q) = Waal”, 9) 
Wia(F", gq) = WaalF’, ¢) = Wail”, 9); 


(5) 


and it is easily shown that since 


Me Mi — He 
—?ar——, 
M1 1 — p 


A, = B, = C, . Thus the lemma is established for this case. 

As a consequence of Case II, Q(n, F’) S min{Q(n, F), Q(N, F’)}. For if (say) 
Q(n, F) = min{Q(n, F), Q(n, F*)} < Q(n, F’), then for g = Q(n, F), nq = 
W,(F, q) < W,(F’, q), a contradiction of the case just established. 


IIT: Q(n, FP) < q & min{Q(n, F), Q(n, F’)}. 


Then W,(F’, g) = nq S min{W,(F, q) W,(F’, q)}. But by induction argument 
parallel to that for Case II, it is shown that W,(F, q) = W,(F’, q). 

From Case III it follows by the same reasoning as above that Q(n, F) < 
Q(n, F*). Hence, there is only one remaining case. 


IV: Q(n, F’) S Q(n, F) < q < Q(n, F’). 


Immediately, W,(F’, g) = W,(F, g) = nq S W,(F’, q), and the lemma is 
established. 

Interspersed in the proof just completed is the proof of 

Lemma 4.5. 


Q(n, F’) S Q(n, F) S Q(n, F’). 


LemMa 4.6. Following the optimal strategy, if a success occurs on any trial, 
then the same random variable is used on the next trial—i.e., stay with a winner. 

Proor. In view of Lemma 4.1, we need only show that if X is required and 
wins, then X is required on the next trial. It is clearly sufficient to show this 
for the first trial. Suppose to the contrary that Q(n, F) = q > Q(n — 1, F’). 
By Lemma 4.3, Q(n — 1, F”) = Q(n — 2, F*) = --- = QC, F’) and, clearly, 
Qi, F*) . bo/ 1 > pf. Hence, q ie 

By Lemma 4.5, ¢ > Q(n — 1, F’) also. Consequently Y is required on the 
second trial, regardless of the outcome of first. Then 


(1) nq < W,(F,q) = m + (n — Iq. 


Hence, gq S mw, and we have a contradiction. 

Lemma 4.7. The a priori expected value of the yield on the r-th step is nonde- 
creasing in r when using an optimal strategy. 

Proor. The proof can be obtained with the aid of the foregoing lemmas; it 
is left as an exercise for the reader. 
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As we have noted in Section 3, Lemma 4.7 is not true, in general, while Lemma 
4.6. is the “stay on a winner’ rule which, appealing as it is, does not hold in 
general. 

With the above lemmas we are in a position to determine explicitly the value 
of Q(n, F). Assume that g = Q(n, F); then the optimal strategy has the follow- 
ing form for appropriate k; . 

(4.1) (A) Observe X until a failure occurs. 

(B) There exists an integer kj = O such that if at least k; successes 


preceded the first failure, continue with X; otherwise switch to Y for the re- 
maining trials. 

(C) There is an integer k, = 0 attached to the second failure such that 
if at least k; + k, successes with X precede the second failure of X, continue 
with X; otherwise switch to Y for the remaining trials. 

(D) In general, let S, be the number of successes that precede the r-th 
failure of X. If S, = ki + ky --- + k,, continue with X; otherwise switch to 
Y for the remaining trial. 

Thus, any sequence k = (k, , ke, --+ , kn) of integers, 0 S k; S n, corresponds 
to a strategy of the same form as the optimal. 

Let E, denote expectation given k and F, and E,, denote expected value given 
k and p.* In using any strategy for n trials, X will be used a certain number, 
N.,, of times, and there will be a certain number, S., of successes with X; 
similarly for Y. 

THEOREM 4.1. 

:, { E.S.]\ 
Q(n, F) = max 4 E,{N.J} ‘ 

Proor. gq = Q(n, F) implies ng = W,(F, q). But since the optimal strategy 
corresponds to a sequence k, this is equivalent to ng = max,{E,[{S.] + E,{S,]}. 
However, E,{S,] = qE,[N,], and neither E,[S,] nor E,{[N,] depends on g. Hence 
q = Q(n, F) implies each of the following equivalent statements: ng = 
max{E,[S.] + gE.{N,]} <q = E,[S.)/E,[N.] for all k with equality for some = 
q = max; E,{S,]/E;,[Nz). 

COROLLARY. 


1 


(n — ExslN,])p ar 


[ " (n — EwlN,]) aF 


3 The authors are indebted to the referee for suggesting the following derivation of 
Q(n, F), which is somewhat simpler and more illuminating than that originally used to 
obtain the result. 
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We give two methods of evaluating Q(n, F). The first proceeds by obtaining 
directly a formula for E;.,[N,] and yields 


E,,IN,] sin > (n a jp? 701 Pe pr 
j=l 


: (3) @(j)-1 ' 
(—1 /). - 2d = 2 f,—-1 
— Il ~~ - — 1 
F f ¢(j)—1 


ven} 


t=0 


where $(j) = max{i: >-i_, ky + i < j} = number of failures of X in the first 
(7 — 1) trials and >" denotes the sum over all choices of f;’s such that f; = 0: 
fi = 0, if kigs = 0, fo = 0; lof: S v and Di = $(j) (f; denotes the number 
of failures between the (ki + hk. + --- k,)-th and (ki + ke +--+ + kjy41)-th 
success). 

The second proceeds by obtaining directly a formula for £,,[S.] = Ex,[N-]p. 
While more complicated in appearance and derivation, it is the result of a 
direct counting. 

(4.3) Ex,[S.] = Dorao 


r=mOQ lr; 


where 


1 
= a —1 eee 
I a 2, we paar ') + 
ke + ees 1 2—a; (> ae i) es - + — ‘) 
+( a—1 you? + * a, — | I 
T—a,—a2°*+ay—) ( \ 
a,~—1 et kui + a, — 1 ( 
oe epee eet) 


oli a 


i=l 


br 


r 
n— y kj-r 
2 


Pp t=—1 pitts Ne i Sal Dp)’, 


with b, = r — a; — @ — +++ — a,, and we interpret (— |) = | and ( °,) = 
0, fore ~ —1. 





SEQUENTIAL DESIGNS 


Some special cases are worth noting: 


[ @+pa 
008, F) = 2 ——__— 


[ (1+ p) ar ; 


1 \ 
‘| @+er+pyar [ @+ 2p) aF| 
Q(3, F) = max = am : 4 


> 
’ 


? of { 
p)dF | (1 + 2p) dF | 
/0 | 


Kach term of Q(3, F) can occur; e.g., for F(p) = p, the first is the maximum 
(value 13/22), while for F(p) = p'”*, the second is the maximum (value 23/88). 
The expression for Q(n, F) cannot be simplified in any essential way, which 
again testifies to the complex nature of the optimal strategies in sequential 
design problems. 
If one chooses ky = r, kz = kj = --- = O, then 
] mtmet::: + pr + (nm — 1) orgs 


(4.4) E,lS. (a- 
7 E{Ned ltmtees Hua t (n— rp,’ 


al 
where yp; = [ p' dF(p). For distributions such that >-7_; u, = +, at least, a 
“0 


reasonable approximation to Q(n, F) may be had by taking r = n — 1 and using 


ee db ee 
L(n, F) = sie ciee Oe es a aie ce 


in place of Q(n, F). For the uniform distribution, Q and L coincide for n S 4, 
but not for larger n. It is worth remarking that L shares many of the properties 
of Q. 

Lemma 4.8. L is nondecreasing in n, and L(n, F*) = L(n, F) = L(n, F’). 

The proof is omitted. 

We close by noting that if the number of trials is sufficiently large, one should 
almost always commence by using X. More precisely, we see from (4.4) that as 
n—» «©, Q(n, F) becomes at least (u,4:)/u, , and this for every r. But (u-41)/u, 
is the expected value of p given r successes which will tend to the supremum of 
the spectrum of F as r increases. Clearly, if g is greater than the supremum of 
the spectrum of F, one would never play X; while if g is less than the supremum, 
for all sufficiently large n, X should be used first. Finally, for a fixed n we have 
the following intuitive result: 
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Lemma 4.9. Given n and F, Y should never be used if 


al 


| pa — p)" aF 


1 


(1 — p)"" dF 


The rigorous proof may be supplied by the reader. 


5. An applied problem. An interesting problem in industrial inspection is 
closely related to the problem of Section 4. Suppose that lots of n items are 
produced by a process having probability p of producing a defective where p 
varies from lot to lot according to an a priori distribution, F(p). Let the loss 
per defective item accepted be unity and the cost of inspection be c per item 
inspected (c < 1). Items are drawn and inspected (defective items found being 
replaced by good items at no additional cost) until a sequential stopping rule 
terminates inspection, at which point the remainder of the lot is accepted. A 
stopping rule is desired which will minimize the expected loss. 

One may proceed to attack this problem in the spirit of Section 4 and find 
a completely analogous series of lemmas, culminating in the theorem that for n 
items remaining and a priori distribution F, it is optimal to inspect another 
item or to accept the remaining n according as c is less than or greater than 
Q(n, F). The same result is more immediately obtained by noting that the 
problem is equivalent to finding a rule to maximize the gain if one wins c for 
each item not inspected, nothing for each good item inspected, and one for 


each defective item inspected (and replaced). This latter problem is precisely 
that treated in Section 4. 
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TABLES FOR COMPUTING BIVARIATE NORMAL PROBABILITIES 


By Donaup B. OwEN 


Sandia Corporation 


1. Introduction. Various tables have been published for obtaining proba- 
bilities over rectangles for correlated bivariate normal variables. Some of these 
tables give the probabilities as functions of three parameters (see [1], [2], and 
[3]). Others tabulate related two-parameter families from which these proba- 
bilities may be computed (see [3], [4], [5], [6], and [7]). The tables given here 
are of the latter type. They have been computed for use with a special two- 
dimensional interpolation scheme, which is described in Section 4. These new 
tabulations reduce considerably the amount of interpolation work required 
over that needed with previous tables. The function tabulated also eliminates 
an arctangent function from the formula for the bivariate normal over a region 
outside of a rectangle as compared with the formula for Nicholson’s tabulation 
in [5]. Section 3 contains a derivation of the formulas given in Section 2 for 
using a two-parameter table to compute probabilities over rectangles. The 
tables given below should prove very useful, since examples where bivariate 
normal integrals over polygons are needed to solve practical problems abound 
in the literature. For example, see [6], [8], [9], and [10]. 

The usefulness of the T(h, a) function tabulated below was also recognized 
by Professor Harry A. Bender, University of Rhode Island, who submitted, 
after this paper was received by the editor, a somewhat shorter tabulation 
than given here. An abstract of Professor Bender’s paper appears in [15]. 

For h and a > 0, 7h, a), the function tabulated, gives the volume of an un- 
correlated bivariate normal distribution with zero means and unit variances 
over the area between y = az and y = O and to the right of z = h, i.e., the 
area shaded in Fig. 1. 

Cadwell in [11] gives a method for obtaining the volume of a bivariate normal 
over any polygon. In Fig. 2, if AB is a side of any polygon, then the volume 
over the shaded area for an uncorrelated bivariate normal with zero means and 
unit variances is given by 


T(h, a2) — T(h, ay) 


for ag > a, where h is the length of the perpendicular from the origin to the 
line through AB and a,h is the distance from the foot of the perpendicular, C, 
to B and ah is CA. If C lies between A and B, then the 7-functions are added 
instead of subtracted. By composition of volumes like this, it is possible to 
obtain the volume over the area outside of any polygon. Section 2 includes 
some useful formulas for doing this. 
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Fia. 2 


Fic. 1. The area over which 7 (h, a) gives the volume of a standardized bivariate normal 
with correlation zero. 


Fic. 2. A typical area for computing the bivariate normal over a polygon. 


2. Summary of formulas. The fundamental formula for finding volumes 
over rectangles is 


(2.1) Bh, ks p) = 4G(h) + 4G(k) — T(h, an) — T(k, ax) — o 


where the upper choice is made if hk > 0 or if hk = 0 but h + k 2 O, and the 
lower choice is made otherwise, where 
sedans oi dasa h iil 

02) = RVi—e Vine RVI eV 
and where B(h, k; p) is the volume of a bivariate normal with zero means and 
unit variances and correlation p over the lower left-hand quadrant of the ry-plane 
wher. divided at x = hand y = k, G(h) is the univariate normal with zero mean 
and unit variance integral from minus infinity to h, and T(h, a) is the function 
tabulated below. 

The T-function is tabulated only for 0 < a S 1, and ~, but it is possible to 
obtain values for 1 < a < © by use of the following formula: 


(23) T(h, a) = 4G(h) + 4G(ah) — G(h)G(ah) — T («t, x). 


Values for negative a or h may be obtained by using 
(2.4) T(h, —a) = —T(h, a) 
and 

(2.5) T(—h, a) = T(h, a). 


Note that (2.3) requires a to be positive and hence when a is negative, first apply 
(2.4) and then (2.3). 
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Other useful formulas are: 


T(h, 0) = 0, 


T(0, a) = 5. arctan a, 


T(h, 1) = 3G(A)[1 — G(h)], 


and 


i — Gin if hh = 0, 
“am ye ‘oo ths 0. 


For finding volumes of the general correlated bivariate normal over polygons, 
the first step is to make a rotation and stretching of the axes to reduce the 
function under the integral to the form of the 7-function. A transformation 
that will do this is 


ploy (Ene +t] 
V/2 + 2p Ox Gy F 


i et el 

7m V/2 7 2p wr Cy ; 
for p < 1, where ux, wy are the means of the X and Y variables and ox , cy 
are the standard deviations of the X and Y variables, respectively. This will 
take the original polygon into another polygon in the w plane. The vertices of 
the new polygon should be computed and a graph drawn. For each side of the 
polygon the volume over a region like that shown in Fig. 2 may be computed 
with the aid of these formulas: 


____|hike— fobs | 
V (he — bi)? + (ke — fy)?’ 
ai | hi(h, — hi) + kilke sae ki) | 
| hike — heky | : 


| ha(he — hi) + ke(ke — ki) | 
| bike — heks | 





a, = 


where the vertical bars indicate absolute value and where (hi , ki) and (he , ke) 
are the coordinates of two adjacent verticesgon the polygon. With the aid of 
the graph, these volumes are then easily combined to give the volume over the 
outside (or inside) of the polygon. 


3. Derivation of the relationship between the bivariate normal and the 
tabulated function. 
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Bh, k; p) 


1 hk ’ 
. on V/1 — p ‘. ‘ exp [—}(z* — 2pry + y’)/(1 — p)] de dy, 


(3.2) G(x) = Tz [ exp (—3t') dt, 
and 
(3.3) T(h, a) = ef seis +: i zy) 


It is also convenient to have a second form of (3.3), which is the function in 
Tables A, B and C. It may be obtained by differentiating with respect to h 
and then reintegrating. The result is 


h az 
(3.4) T(h, a) = =a | [ exp [—3(27 + y 2)) dy dz + arc tan a 
2x Jo Jo On 


The 7-function is related to the V-function tabulated by Nicholson in [5] as 
follows: 


T(h, a) = = arctan a — V(h, ah). 
T 


If (3.4) is integrated by parts, 
{3G(h) + 4G(ah) — G(h)G(ah) — T(ah, 1/a) ifa = 
laG(n) + 4G(ah) — G(h)G(ah) — T(ah, 1/a) —} ifa <0. 


It will be shown that (3.1) can be expressed as a function of expressions like 
(3.2) and (3.3). If (3.1) is differentiated with respect to p, then integration with 
respect to x and y can be effected. Integrating that result with respect to p 
yields 


(3.5) T(h, a) = 


B(h, k; p) 
(3.6) i 
= | (1 — 2)7” exp [—3(h? — 2hke + k°)/(1 — 2)] dz + G(h)G(k). 
0 
From this B(O, 0; p) = 1/(27) arcsin p + 4, a well-known result (see [12], [13], 
and [14]). Now (3.6) may be rewritten as 
Bth, k; p) 


=f a — 2" he exp [—4(2? — 2hke + #)/(1 — 2) de 


J +2/ (j — 2" gh exp [—4(h? — 2hke + k*)/(1 — 2)] dz 
+ sane. 





BIVARIATE NORMAL PROBABILITIES 


In the integrals above, making the substitutions 
k — hz a 
a an 
hVi-2 
respectively, produces 


& 


Bh, k; p) = 1 G 


(3.7) A 
a + G(h)G(k). 
7 
Applying (3.5) to (3.7), gives 
te i(, = eh_ hh = pk _ 
sa0) — 7(np Fee g) +400 - (sie) 
if hk > 0 or if hk = 0,hork = 0 
k — ph 
iE e) + 16H) 


: h — pk _ : eae ' 
- 1(b- 5)- 4 ifhk < Oorifhk = 0O,hork <0 


$G(h) — T (i 


(3.8) B(h, k;p) = | 


which expresses the bivariate normal in terms of the G- and 7-functions in a 
compact form. 

A series expression for T(h, a) may be obtained by expanding the numerator 
of the integrand of (3.3) in the usual exponential series, dividing by the de- 
nominator, and integrating term by term. Rearrangement of the terms of this 
series gives 


(3.9) T(h, a) = scan > cat, 
T 


a 
Qr j=0 


where 


G = (a5 iL} rea de Pi 2 ail 


which converges rapidly for small values of a and h. 


The values of T'(h, a) given in Tables A, B, and C were computed using the 
series (3.9). They were checked by using Gauss’ seven-point integration formula 
on (3.3). The tables were also checked by taking differences. These checks show 
that at the points of tabulation the table is accurate to as many places as given, 
i.e., to s'x decimal places. 


4. Interpolation in the tables. Table A has a coarse interval in the parameter 
a and an interval fine enough for ordinary linear interpolation in the parameter 
h. Table B has intervals in parameter a fine enough for ordinary linear inter- 
polation and has parameter h at a coarse interval. Ordinary linear interpolation 
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may be used throughout Table C. Tables A and B were designed for interpola- 
tion as follows: To interpolate for a value T(h2, a2), say, a; and a; should be 
picked closest to az; from Table A so that a; S a, < a; , and h; and fA; should be 


picked closest to h, from Table B so that hi; S hz. < hs. Then the interpolated 
value of T(he , a2) is obtained from 


3 3 
T (he, a2) = >» > wi; T (hi, a;), 


t=1 j=1 
where the weights w,; are given by 
—(1—b)\(1—c) l1l—e —b(1 —c)' 
(1 — b) b 
—(1 — be 
where 


a Qaa—~ aq _hk—-h 
So. and eS 


The weights were obtained by considering the result of ordinary linear inter- 
polation where nearby values of the function are subtracted before interpolating, 
say, T(he , a1) — T(hi, a:) and T(hz , as) — T(hi , az). These numbers are inter- 
polated with respect to a to obtain T(h2, a2) — T(hi, a2), and then 7(h; , az) 
is added. This process may also be followed with (hs , ai), (hs , a3), and (hs , a). 
If the two estimates of T (hz, a2) are then combined as in linear interpolation 


with respect to h, i.e., (1 — c) times the first estimate plus c times the second, 
the above weights w;; follow. The interpolation on the differences could also 
have been first with respect to h to obtain the two estimates and then with 
respect to a between these two. The same weights w,; are obtained by doing this. 

This method of interpolation has resulted in approximately a 90 per cent 
reduction over the size of a table needed for linear interpolation. Quadratic 
interpolation using Bessel’s formula would give comparable results to the new 
method with approximately an additional 80 per cent reduction in the number 
of entries, but the additional work involved more than outweighs that reduc- 
tion in the number of entries, even though the table is used only a few times. 
The procedure given here may be termed a compromise between linear and 
quadratic interpolation. 

ExamPLe. Find 7(.15, .625). From the tables, the following entries are 
extracted: 








.073792 .088903 - 102416 
.072902 . 101082 
.071347 .085848 -098755 











BIVARIATE NORMAL PROBABILITIES 


The weights to be applied are 





The result is 7(.15, .625) .0877898. Calculation of this number from the 
series gives .0877919. The result of the interpolation therefore provides a dif- 
ference of two in the sixth place. Further calculations show that this difference 
could be reduced to five in the seventh place if the linear interpolations for 
T(0, .625) and 7.25, .625) were eliminated and the exact values for these 
points were used. The value for 7(0, .625) was rounded up during the linear 
interpolation with respect tc a in Table B, since second differences in the a 
direction for all h are negative. A similar working rule for rounding when inter- 
polating in the A direction is to round up the interpolated value when 0 < h < 
.9 and to round down for h = .9 in Table A. The value obtained from the above 
interpolation scheme should be rounded up for 0 < h < 1.50 and rounded 
down for h = 1.50, for all values of a > 0. 

Empirical examination of the errors in interpolation by this scheme shows 
that the maximum error that would occur anywhere in Tables A and B is seven 
in the sixth decimal place, and that this could be reduced to six in the sixth 
decimal place if the linear interpolations in Tables A and B were eliminated 
and the exact values were used. Linear interpolation in Table C gives errors 
less than four in the sixth decimal place. Table D gives the maximum error 
in the sixth decimal place, which will be committed when using the above 


TABLE D 








i enwnws 
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interpolation scheme over the ranges of h and a indicated. The sign preceding 
the entry is the sign of the exact value of 7'(h, a) minus the interpolated value 
for that difference which is the largest in absolute value. These are empirical 
results obtained on the digital computer by using the interpolation process 
and the exact value for fifteen points systematically spaced in each block. 
A number in Tables A, B, and C whose last nonzero digit is five is followed 
by a plus or minus sign, respectively, to indicate that the number should be 
rounded up or down when dropping the digit with the five. Any entry having 
the first three digits the same as those of the entry directly above it has had 
these digits dropped. 
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ASYMPTOTIC FORMULAE FOR THE DISTRIBUTION OF HOTELLING’S 
GENERALIZED 7? STATISTIC’ 


- 2 
By Korcui Iro 
University of North Carolina 


1. Summary. In this paper the asymptotic expansion of a percentage point 
of Hotelling’s generalized T¢ distribution is derived in terms of the corresponding 
percentage point of a x’ distribution. Our result generalizes Hotelling’s and 
Frankel’s asymptotic expansion for the generalized Student 7’ [3], [4]. The tech- 
nique used in this paper for obtaining the asymptotic expansion of 7% is an ex- 
tension of the previous methods of Welch [8] and of James [5], [6], who used 
them to solve the distribution problem of various statistics in connection with 
the Behrens-Fisher problem. An asymptotic formula for the cumulative distribu- 
tion function (c.d.f.) of T} is also given together with an upper bound for the 
error committed when all but the first few terms are omitted in the series. This 
formula is a sort of multivariate analogue of Hartley’s formula of “Studenti- 
zation” [2]. 


2. Introduction. In the multivariate analysis of variance we use the following 
canonical probability law: 
P(Xo, Xi) 
(2.1) ‘ ae sl 
= const. exp [—3 tr A(X, — §)(X; — &) —3 tr AXoXo] dXodX,, 


where X; and Xoarep X mand p X m matrices, respectively, and (1/m)X1X; = S; 
is the sample “between” dispersion matrix and (1/n) XoXo = Sy is the sample 
“within” dispersion matrix, the prime denoting the transpose of a matrix. £ is a 
p X m matrix, (1/m)ét’ being the population “between” dispersion matrix, and 
A isa p X psymmetric positive definite matrix. It is assumed that m may be 
= por <p, but n 2 p. To test the null hypothesis Ho: = 0, Hotelling [3] pro- 
posed a test based on the statistic: 


T. = mtr S,So° 


and derived the exact distribution of this statistic when p = 2 and — = 0. For 
general values of p the exact distribution of 7% is not available at present, even 
in the null case — = 0. 


3. Derivation of asymptotic formula of 7 . For general values of p it is known 
that the statistic 


(3.1) x = mtr S,A 
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has ax distribution with mp degrees of freedom. That is to say, we have 
Pr {mtr S,A S 26} = G,(8), 


where 26 denotes the tabled value of x for a particular level of significance, 
p = mp/2, and 
-@ 


G,(6) = (ol | om ce dt. 


Hence, if A is known, the statistic x given by (3.1) may be used to test Ho 
exactly, and if A is unknown but if So is based on a large number of degrees of 
freedom, i.e., if n is large, we may use as an approximation the result 


Pr {mtr S,So' S 26} = G,(@). 


This suggests that in the general case we try to find a function A(So) of the 
elements of So such that 


(3.5) Pr {mtr S,;So S 2h(So)} G,(@). 


When n is large, 2h(So) will approach 26 = x, and we now expect to write 
2h(So) as a series with x’ as its first term and successive terms of decreasing order 
of magnitude. 

Now 


(3.6) Pr imtrS,So' < 2h(So)} = | Pr {mtr S,S>’ < 2h(Sp) | So} Pr |dSo}, 


R 


where the first expression on the right denotes the conditional probability of the 
relation indicated for fixed values of the elements of So , and the second denotes 
the probability element of So , which has a Wishart distribution with n degrees 
of freedom, and the domain of integration R is over all possible values of the 
elements of Sp. Now we may expand Pr {m tr S,So' < 2h(So) | So} about an 


origin (01, , 022, °** , Opp, %2, °** » Tp-41.p) ina Taylor series, where 


Tip 


Thus, 


Pr {mtr S,S>° < 2h(So) | So} 
aR, a |) 
=< exp| 7 (si; — ois) 5 >PrimtrS,A s 
; isj=l 00;; 


= fexp [tr (S) — A” )a]} Pr {mtr S,A < 2h(A™)}, 
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where 8;; is the ith row, jth column element of So , and @ denotes the matrix of 
derivative operators: 


2 2 
LL Cees Oop2 


its typical element being 0;; = 3(1 + 6;;)(0/d0;;), where 6;; is the Kronecker 
delta. Whether uniformly convergent or not, the right-hand side of (3.7) is an 
asymptotic representation of Pr {m tr S,S° S 2h(So) | So}, for sufficiently large 
values of n. Hence, substitution of (3.7) into (3.6) and term by term integration. 
which may be done legitimately, yields: 


G,(0) = / exp [tr (So — A™)d] Pr {mtr S:A S 2h(A™")} Pr {dSo} 
R 


(3.9) 
= @ Pr {mtr Si:A S 2h(A”)}, 


where 
0 = | exp tr (Ss — “)é] Pr {dS}. 
R 
Since Sp has a Wishart distribution with n degrees of freedom, we have 


® = exp [—tr A” 4]-const. | A |"” i are 
R 


“exp  t (sea - Ass) | d So 


exp [—tr — 8] - const. | A n/a [ | So joer 
R 


‘exp | - 2) s.| dSo 


os . oar 
exp [—tr A~ a]: | a |” A—=a| 
' 


j—n/2 


| 9 
exp [—-tr A” d]- |I — ~A'8 i? 


where | is the p X p identity matrix. Now using [5], 
(3.10) —log|I — Y| = tr ¥ +4tr¥?+3tr¥*+---, 
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we obtain 


20°] 


= exp | - Ava + n fe (2 sa) + itr (2 a) 


a oe 
8 = exp[ —tr a ‘a— “log | I — —-A ‘|| 
n 


n n 


9 3 
+ atr(2 sa) 4 
nm 


= exp E tr (A7'd)" + a3 tr (A~'a)* + | 


=m i+ tr (Aa)? + -; {4tr (Aa) + 3(tr (A8)*)*} + O(n’). 
7 


5 


It is to be noted here that in (3.11) the operator 8 does not act on A” present 
in @ itself, and it is more useful for our purpose to write (3.11) in suffix form: 


1 so 
@ = 14+ = 2, ent tute 
7 


] 
4 “ * * 1 “ * 
T n? i? z Ors Stu Tow Ost Our Our + 2 > Ors Ttu Tow Oxy Ost Our Owz Oyo} 


+ O(n), 


where >> denotes the summation over all suffixes r, s, - - 
from 1 to p. 


Now we represent h(So) as 


- , each of which ranges 


(3.13) h(So) oes 6 + hy(So) + he( So) + ees 


h,(So) being of order n “; i.e., we write h(So) as an asymptotic series such that 


In*{h(So) — 0 — Ii(Se) — --- — he(So)}| 


is made arbitrarily small for sufficiently large values of n. Then (3.13) may be 
substituted into Pr {m tr S,A S 2h(A~')}, and by Taylor’s expansion we have 


Pr {mtr S,A S 2h(A™’)} 


= exp [{ii(A') + ho(A*) +--+ }D] Pr {mtr SA < 26 
= {1+ {hi(A~) + ho(A~*) 4-3 WD 


' 
j 


+3{h(A") + ho(A*) + --- 2D? +.--- 


X Pr {mtr S,;A S 26}, 
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where D = 0/06. By substituting (3.12) and (3.14) into (3.9), we obtain 


, 1 . 
G,(8) - E + 7% 2 Ore tu Ost Our 
l “ . 
+ = {$>> Ors Fin Tow Ont Gas Cup 
n 


+ >> Ors Siu Tow Fry Ost Our Owz Oye ; + on) | 


* [1 + h(a“) D + {ho(A)D + $h3(A“)D*} + O(n™)) 
<X Pr {mtr S:A S 26}. 


By equating terms of succevsive order in (3.15), we obtain 


= a) 


(3.16) {hy(A7) D + -> Ors Oru Ont Our? Pr {mtr 8S, A S 20} = 0, 
} 


| ta“) + 3hi(A“)D’ 


(3.17) + > Ore Ory {Ay ”” (A~)D + 2hf?(A*)a,-D + hi(A*) de: Our D} 


4 1 
+ Sn? a Tre Fiu Tow Ost Ouv Owe + on? 7: Trs TF tu Tow Try Ost Our Owe an | 
X Pr{ mtrS,A S 26} = 0, 


and so on, where h{*?(A™*) = d.e(A*) and hi**"?(A™*) = Aude (A”). 

It now remains to carry out the operations 0 and D indicated in (3.16) and 
(3.17) in order to obtain h,(A™‘), ho( A’) and hence h(So), he(.So). These operators 
will operate on Pr {mtr S,A S 20}, which is a p X m-fold integral, and the 
operations may be thought of as differentiations, with respect to the boundary 
only, of the integral of the probability density function of the X, throughout a 
region in the space of X, . The method used to evaluate 0,,0., Pr {m tr Si:A S 26}, 
OsOuewr Pr {mtr Si:A S 26}, --- , is to change the boundary slightly, expand 
the integral in powers of the quantities specifying this change, and obtain the 
derivatives by comparison with Taylor’s expansion. We consider 


(3.18) J = Pr{mtrS,(A~ + ©& S 26}, 


where ¢ is a p X p symmetric matrix. Then by Taylor expansion we have 


f 1 1 
J = ‘ Ll + > €re Ore + 21 > €rs €tu Ore Otu + 31 bw €rs €tu €ow Ore tu Dow 
(3.19) 


ob i - Ere €tu vw Exy Ors Stu Dow Oxy + “wh Pr {mtr S,: A S 26}. 
. ) 
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On the other hand, J is, by definition, written as 


| A ‘ih 


(3.20) = 
(3.20 (Qx)Pmit 


/ exp [—} tr AX, X‘] dX, 
i. 


where X,X; = mS; , and domain of integration #’ ranges over all possible values 
of the elements of X, such that m tr S,(A~ + 6)” S 286. It is now easy to show 
that integration of (3.20) yields 


| —D,E\\" 
(3.21 an § deeper ) (0 
! , C= G0), 
where D, is a diagonal matrix which satisfies 


Xi(p X m) = I(p X p)Z(p X m), 


1 


(3.22) - 
AIY(A' + ©) 


r = I(p), 
4IYAT = I(p) — D,, 

I being a nonsingular matrix, and E is an operator such that 

EG,(6) = G,41(8). 
Now, letting 4 = E — 1 and using (3.22), we have 
[1 ~ D,E| _|1—-D,— DA 
I — D, | | —D, | 
_ | at’Ar — (N'A + 2 T — STvarja | 


é ~ aPaP | 
|A— {(A" +6 — Aja] 


— \ =|I1—{a“(A> +e)" —-J}A| 


=|I — XA\, 


(A + ©) — I. Hence, (3.21) becomes 


where X = A 
(3.23) J = |I — Xalv'""G,(8). 
Now, using (3.10) again, we rewrite (3.23) as 


J = exp{— Mog | - xa |G, 


= exp{™ tr XA+ 7 tr X°A? + - tr Xa’ + : tr Xtat 4 «.} G,(6) 
\< ) ) 


i E + tr XA + iT tr X?+ a (tr X)*} a? 


9D 
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f 2 3 \ 
+4™ tr X* + % (tr X)(tr X*) + & (tr X)*} a" 
6 8 418 ) 
|” 


r O , = m 
<—trxX + 
8 


7 (tr X)(tr X*) + = (tr X’)* 
m SA‘ + |G) 


32 


n 


. 8 


+ 
(tr X)*(tr X°) + - ‘ (tr X)* 


X can be represented as 
Xo ec? +o) -T os C' + eer 
-l=(1+ De ADA) —I 
D en(ATIA) + DY ers€eu(ArcA)(AT2A) 
— DY en€im€ow( Are A)(AreA)(Ase A) 
+ DY ern€tulowtry(Are A)(ATuA)(ApweA)(AH A) — -+-, 


as ‘ ° - ° =) ° ° 
where A>, is a p X p matrix obtained by operating 4,, on A, i.e., Aj, has its ith 
row, jth column element, $(6,;6,; + 6,:5,;). Writing 


tr (A>. A) = (rs), 


tr (Aj, A)(A7uA) = (rs | tu), 
tr (A;, A)(Aj7zA)(AveA) = (rs | tu | vw), 
tr (AZ, A)(ATeA)(AzeA)(AZ A) = (rs | tu | ow | zy), 


- fe 


and substituting (3.25) into (3.24), we obtain 


‘ | 
J = E T = Ga 4 -5 (rs)A > - oy z €rs Eu (rs | tu) (ma + 3 a*) 


\ 


> (rs)(tu)A’> + wi Cte {(r | tu | vw)(—3mA — 3mA’* — mA’) 


3 \ 
L (rs)(tu | vw) (—$m*A*? — 3m’a*) — * ( rs) (tu) (ow) 
4 7 Dd ern€tutow€ry «(rs | tu | vw | ry)(12mA + 18md* + 12mA’* + 3mA‘) 
+ (rs)(tu | vw | cy)(6m’A* + 6m’A* + 2m’d‘) 
+ (rs | tu)(vw | zy)(3m7A* + 3m*A* + 3m’d‘) 
+ (rs)(tu)(vw | cy)(3m’a* + 3m’ A‘) 


‘5 
+ (rs) (tu) (vw) (xy) ua a‘) + | G,(@). 
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Then term by term comparison between two expansions for J, (3.19) and (3.26), 
gives 0,, Pr {mtr Si:A S 20}, 0,0 Pr {mtr S:A S 26}, etc., but in doing so we 
must take such a care that, for example, 


ye Oi jeeresee = Zz Di jneve je 


implies a; = 6; if both a; and b;;, are completely symmetrical in their 
suffices. With this in mind and using the relation 


AG,(6) = — Eg,(8), 
where g,(6) = D G,(6@), we obtain 


dr, Pr {mtr SA < 26} = 5 (rs) Eg,(6), 


OrsOtu Pr {m tr SA < 20} 


(3.28) 


2 
m 


' , 
= {5 (rs | tu)(E* + E) +75 (rs)(tu)(E* — E)> 9,00), 


2 


re tu One Pr {mtr Si:A S 26} = {mrs | tu | vpw)(E* + E* + E) + + 
(3.29) -[(rs)(tu | vw) + (tu) (rs | vw) + (vw)(rs | tu)](E* — E) 
+ = (rs)(tu)(vw)(E* — 2E* + BS - 9,(9), 
OreAtuPrwIxy Pr {mtr S:A S 26} 
( 


= —<m[(rs| tu | vw | zy) 


\ 


+ (rs|vw| ary |tu) + (rs| zy | tu | ow)] (E* + B® + E’ + E) 
+ . [(rs)(tu | vw | zy) + (ary) (tu | vw | rs) + (vw) (tu | zy | rs) 
+ (tu)(vw | zy | rs))(E* — E) + . [(rs | tu)(vw | zy) 

+ (rs | vw)(tu | zy) + (rs | zy) (tu | vw)|\(E* + E* — E* — BE) 
+ . [(rs)(tu) (vw | zy) + (rs)(vw) (tu | zy) 


+ (rs)(xy)(tu | vw) + (tu)(vw)(rs | zy) + (tu)(zy) {rs | vw) 
+ (vw)(zy)(rs | tu)](E* — E* — EF’ + EB) 


4 
+ 7 (rs) (tu) (vw) (zy) (E* — 3E* + 3K" — BS g,(0). 
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Upon substituting (3.28) into (3.16), we obtain 


nO) = 2 E enn amt {+8 


+ m’(st)(ur) < 
\P 


Now, 


(st) = tr AvsA = 304.3 (Besdey + 848,3)0" = 3(0" + 0°") = o” 
and also, 
(st | ur) = tr (Aj A)(AG}A) = R(o"o™ + oo"). 
Hence we have 
dori (st |ur) = 4p(p + 1) 
and 
D> oro 1u(st) (ur) = p. 
We also note that 26 = x’, p = mp/2. Therefore we finally obtain, after some 


simplification, 


-,_ 1jp+tm+i1 4 = 2 
(3.31) hi(A~) ee a Xx + (p m+ vx}. 


In a similar way we substitute (3.29), (3.30), and (3.31) into (3.17) to evaluate 
ho(A'). We note here that since h(A~*) given by (3.31) is independent of A~* 


the terms involving h{’?(A™’) and hf"? (A7) in (3.17) do not appear. As before, 


it can be easily shown that 

z. OrsF tuFow( St | uv | wr) = Aol p? + 3p + 4), > TreF tuTvw( St) (uv | wr) 
= 7 OrsF tuTow( Uv) (st | wr) = > TrsF tuTow( wr) (st | w) = 4p(p +1), 

Door F tu Fon (st)(w)(wr) = p, > rs tuTowF 2y(St | ur | wx | yr) 
= Lor wFrwFzy(st | wx | yo | ur) = 3p(p + 1)’, 

> TrsF tuTvivT zy (St | yo | ur | wx) = tp(p + 3), = OrsF tuTewFzy(St)(ur | wx | yr) 
= YS ono ute 2y(yr)(ur | wa | st) = D> apo uGowtsy(wx)(ur | yv | st) 
= DP oF uFowFzy(Uur)(wz | yo | st) = 4p(p + 1), 

Dd FrsF tuFvieT ny (St | ur)(wa | yo) = 3p'(p + 1)’, D> FrsF tuFowF xy (st | wz) (ur | yv) 
= > TrsF tuTowF ry(St | yv)(ur | wr) = $p(p + 1), 

Qe 19 wou ay (8t)(Ur)(we | yo) = Lonoworwtay(we)(yr)(st | ur) = $p°(p + 1), 

Do FrsF uFowF xy (8t)(wx)(ur | yo) = D> ono wFewezy(st)(yr)(ur | wa) 


= x FrsF tuTowT ry(Ur) (wx) (st | yv) = az rsF tuTouT 2y(Ur)(yv)(st | wr) = p, 
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~ OrsF tuTowI zy( St) (ur) (wx) (yr) p. 


Using these results we obtain from (3.17), after some simplification, 


(mp + 2)?(mp + 4)(mp + 6) 


aah 6 — 1) 2)( ee ( 2) 3 
ho( A 1) mi 1 % 1) + 2) n 1)(m 4 ) ‘ 


{8° 


4mp’ + 2(3m* + 3m + 10)p" 
+ 2(2m* + 3m* + 17m + 18)p + 4(5m* + 9m + 2) _ 


(mp + 2)*(mp + 4) 


. 13p° + 24p — 11m’ +7 


4 


mp + 2 


(7p + (—12m + 12)p + (7m? — 12m + Dix’, 


which is independent of A just as h,(A™’). 
Now we substitute (3.31) and (3.32) into (3.13) to obtain 


T? = 2h(So) = 20 + 2hi(So) + 2he(So) + O(n™) 
1 {ptm+1 
2n\| mp+ 2 
2 ll 6(p — 1)(p + 2)(m — 1)(m +- 2) 
" 24n? \~ (mp + 2)2(mp + 4)(mp + 6) 


4mp’ + 2(3m* + 3m + 10)p° + 2(2m* +- 3m? + 17m + 18)p 
i aestesics see nie + 4(5m* + 9m + 2) 
(mp + 2)*(mp + 4) . 


+ (p—mt+ 1x} 


/ 





2 ae ‘ ) 2| , = 
+ [7p + (—12m + 12)p + (7m — 12m + Llx ? + O(n * 
/ 
which is the asymptotic expression of a percentage point of the 7 distribution in 
terms of the corresponding percentage point of the x° distribution with mp 
degrees of freedom. 
If we put m = 1 in (3.33), we have 

mp2 2 1 ‘2 2 
=x + 5 ix + px} 


(3.34) 
L (4° + (18p — 2)x* + (7p" — 4)x°} + O(n), 


24n? 


+ 
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which is the asymptotic expression of a percentage point of the generalized 
Student 7 distribution. This result, (3.34), was previously obtained by Hotelling 
and Frankel [3], [4]. 

There is another check of (3.33) by putting p = 1 in the formula.’ In this 
case we have 
1 


Poe +— it - oO 


+ oa: {4y° — 11(m — 2)x* + (m — 2)(7m — 10)x"} + O(n™), 
“at 


which is the correct expansion for the ordinary variance ratio F with m, n degrees 
of freedom in terms of x° with m degrees of freedom [1]. 


4. Asymptotic formula for the c.d.f. of 7). Let F(26:) be the c.df. of 7%, i.e., 


(4.1) F(26,) = Pr {mtr SiSe° S 26;}. 


=a & 


Then, as (3.6), we can write 


, o~l 4 Y owl la C320 
Pr {mtr S:So9 S 20;} - / Pr {mtr S:;Sq S 26,| So} Pr |dSo} 
R 


(4.2) 


= 9 Pr 1m tr Si:A < 26;}, 


where @ is given by (3.12). Upon substituting (3.28), (3.29), and (3.30) into 
(4.2) we obtain, after some simplification, 

. a 1 (2p+m 1)6; } 

F(26) = Go(6:) — 2 {2P+m+ Uhr 


2n\ mp+2 + (p— m+ 1619661) 


24{mp* + 2(m? + m + 4)p + (m*> + 2m? + 21m + 20)p 
1 7 : + 8m? + 20m + 20}64 
48n? (mp + 2)(mp + 4)(mp + 6) 
4{3mp* — 2(3m? — 3m — 4)p’ — 3(3m* + 2m? + 11m — 4)p 
— 40m” — 36m — 4}6} 
(mp + 2)(mp + 4) 
2{3mp* + 2(3m? + 3m — 4)p’ — 3(3m> — 2m? — 5m + 4)p 
— 8m? + 12m + 4367 


mp + 2 
{3mp* — 2(3m> — 3m + 4)p’ + 3(m*> — 2m’ + 5m — 4)p 


8m? + 12m + 436, |g,(A) + O(n), 


3 The author is indebted to the referee for pointing out this check of (3.33). 
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where 


6 
G,(0:) = [T'(»)]~ t’ e' dt, g,(@:) = £. G,(@:), and p = mp/2. 
Jo 06; 

(4.3) is a sort of multivariate analogue of Hartley’s formula of “‘Studentiza- 
tion.”’ In fact it can be shown that when p = 1, (4.3) coincides with Hartley’s 
formula for the c.d.f. of the univariate analysis of variance F statistic. (See 
equation (28), p. 178, [2].) 

5. Discussion of the error and remarks. In view of the methods used in 
Sections 3 and 4, it is rather difficult to set a bound for the error committed by 
omitting all terms after the first few terms in the asymptotic formula for 7% 
(3.33) or in the asymptotic formula for the c.d.f. of T;, (4.3). There is, however, 
a method to find lower and upper bounds to the c.d.f. of 7 which is fairly good 
for large values of n, and they can be used to set a bound for O(n~*), say, in the 
asymptotic expansion of the c.d.f. of T% . 

We shall first obtain lower and upper bounds for the c.d.f. of 7. It is well 
known (e.g., see [7]) that the joint probability law of the characteristic roots 
€1, €2, °°: , €, of m S,So° under the null hypothesis Ho is given by 


P(ex, €2, °°* 5 Ce) 


’ 5 = s—l 
= C(s, t, D, n) Il rs 2 (1 rs st int 2 de; I (e; — e;), 


t=! i<j=l 


5.1) 


whereO0 Se,S6418°-':S s = min (p, m), t = max (p, m), and 


3/2 


rr ris(n+t—p+t+i)} 
Cls, 6 Pm) = Ta Nay t—s+a}Tihin — p+air G/2 
The statistic 73 is expressed as 
7 = mtr s.2;° = ow Qi, 


t=1 


and the c.d.f. of 79 is given by 


F(26:) = C(s, t, p, n) |: -f 
(5.3) : 


e s—l 
II eee (1 + _ 2 de; II (e; sa e;), 


i=] 


t<j=l 
where R, is the domain of integration such that 0 S e, S e185 °-'-Se< © 
and 0 S > i.e; S 26,. Now for any non-negative values of e; and n, the fol- 


lowing inequality holds: 
log (1 “ *) s os 
n n 
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, 8, Where equality holds when e; = 0 orn — . Hence we have 


x e;. —(m+n)/2 m + n . 
II 1+) -weh~ 222F* «0. 
n a 2n 


t=] t=1 
Therefore, the probability law (5.1) is bounded from below as follows: 
(5.4) Pi(ei1, *°* , es) =< Pla 5 +** ~@) 
where 


& 8 s—1 
Py(er, «++ ,@) = C(s, t, p, n)[ Jes” de; exp| "Et" «| (a 
1 


im] = i<ju 


It must be noted here that P;(e; , --- , e.) is not a probability law, although it is 
non-negative for all e; such that 0 S e, S --- S e < &. Now integrating both 
sides of (5.4) in R; we obtain 


(5.5) F,(24) = F(24), 


where 


F,(26;) = C(s, t, P; n) / vs | II or de; 
Ry; i=] 


m+n —~ 
exp | - ~ on ~ | II (e; _ ej), 


i=1 i<j=1 
and also integrating both sides of (5.4) in R. whereO0 Se, S--- Se, < & 
and 26, Ss aa e; < » and subtracting each from 1, we have 
(5.6) F(26;) < F2(26), 
where 


F (26) =l1- C(s, t, PP; n) / pts /u ee or de; 
2 


i=1 


. s—l 
m+n 
exp| -’ 9 y | Il (e; — @). 
n i=l] i<j=1 
In order to evaluate F,(26,) and F,(26,;), we observe that as n tends to ~, 
T; = > e; has ax’ distribution with st degrees of freedom in the limit; i.e., 
we have 


K(s, t, p) [ i {i eft? de. 
: 1 


t=1 


(5.7) 


exp| -1 > «| Td. i be 


i=l i<j=l 
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K(s, t, p) = lim C(s, t, p, n) = 


nwx 


and p, = st/2. Hence integration of (5.5) yields 


(5.8) F,(26,) = L(s, t, p, n)G,, (= 


rl 


where 


L(s, t, p,n) = C(s, bP, ¥ ( — 
m 


-K(s, t, p) 


Similarly we obtain from (5.6) 

— eg m+ n\) 
(5.9) F,(26;) = 1 — L(s, t, p,n)<41 — G,, ( *) , 
\ 


n 
Now if we write (4.3) as 


S 4 a > 2. 


nM n* 


(5.10) F(26;) = a+ 


where R; is the error committed by omitting all terms except the first three terms 
in the asymptotic series of F(26,), the absolute value of R; has the following 
upper bound: 


. 1/ ay 2 , ay 
(5.11) | Rs | =< max 4 F’\(20;) - Gy ee aa . F3(26,) -a~- — 


’ 
n n 


where F(26,) and F,(26;) are given by (5.8) and (5.9), respectively. 

The actual manner in which (3.33) converges to the true value T; or in which 
(4.3) converges to the true value F(26,;) is not known, but it is hoped that the 
use of the first few corrective terms may result in a test which is more accurate 
than the x’ approximation, at any rate for moderately large values of n. In the 
case of the asymptotic formula for the c.d.f. of 75 (4.3), we may judge the magni- 
tude of the error involved in using the first few terms of the series by (5.11), 
which turns out to be rather small numerically when n is sufficiently large. 

The author wishes to express his indebtedness to Professor Harold Hotelling 
for suggesting this problem and for his guidance in the preparation of this paper. 
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SOME RESULTS USEFUL IN MULTIVARIATE ANALYSIS’ 


By K. C. 8. Prutar 
University of North Carolina and University of Travancore 


1. Summary and Introduction. In this paper a few results (of a purely mathe- 
matical nature) are obtained, which are useful for studying certain distribution 
problems in multivariate analysis—e.g., those relating to the characteristic 
roots of a determinantal equation ([1], [2], [5]). In particular, the results are 
shown to be readily applicable to the moment problems of the sum of the roots 
and the distributions of the extreme roots. Most of the results given are in the 
form of certain recursion formulae for reducing special types of k-th order 
Vandermonde determinants in terms of those of orders (k — 1) and (k — 2). 
The applications of these results are given by S. N. Roy [6] and the present 
author [3]. 


2. Vandermonde’s determinant. Let us first consider a type of determinant 
(due to Vandermonde) which plays an important role in the development of 
this paper. Denote by V» the Vandermonde’s determinant of the form 


where X,, X2,--- , X, are k variables. The determinant can be shown to be 
equal to the expression 


Vo - II; (X; Ss X 4), 


where [] denotes the product over the k variables. The determinant Vo has 
several interesting properties, of which the following will be used in this paper. 

Property 1. If each of the indices of the first 7 columns of Vo is increased by 
unity, the resulting determinant 


(2.3) V; (say) = ( = X\X_ re Xj)Vo ’ 


where >> X,X, --- X; denotes the j-th elementary symmetric function in k 


variables X,, X2, °°: , Xx. 


Received September 28, 1955; revised July 9, 1956. 
1 Part of a doctoral thesis submitted to the Department of Statistics, University of 
North Carolina, Chapel Hill; work done under the sponsorship of the Ford Foundation. 
2 Now with United Nations, New York. 
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3. A special function and the corresponding determinant. Let us consider the 
integral over the domain 0 < 2 S aS -:-: Sa S x < 1, of a function given 
by 


(3.1) f(a, ae, +++, ae) = TI] {xi — x)"e**} TI (@ — 2), 

i=] i>j 
where g, r > —1 and the ¢ is independent of the z’s. It is obvious that, in view of 
(2.2), a (x; — x;) can be thrown into the form of a determinant as in (2.1). 
Multiply the i-th row of this determinant by ajeisi(l — 2e-<«41)'e***! 
(¢ = 1, 2,--- , k), and integrate between appropriate limits, each term of the 
determinant with respect to the variable it involves. Then the integral of the 
function f(x; , 2, --- , Z,) given by (3.1) takes the form 

8 


+k— r r z | 
ap** (1 — a,)’e** day --- ai(1 — 2)’ e'** da. 


| +k—1 rt > 2 | 
[at — me de «++ | afl — 2)" e* das | 
| “0 /0 | 


It has to be remembered that in expanding the determinant (3.2), the order of 
integration must not be changed, and hence we shall call it a “pseudo determi- 
nant.” 

Now let a, g2, --* , @ be real numbers greater than —1. Let us denote by 
U(x; qe ,73 °°: 3q,7;¢) the pseudo-determinant 


| p2 -z | 


|| afk(1 — aze)’e'** day ++: i(1 — zy)"e'™* dae 
. “0 


2 
xi(1 - )"e"™ dz eee I a(1 = 2)"e"™ dx, 
0 


0 
More generally, if we replace r in the j-th column of (3.3) by 
re—jai(j = 1,2, --- , k), 
the resulting pseudo-determinant will be denoted by 
(3.4) U(2; Qe, Te 3 Me-1» Th-15°°° 35MM; 28); 


or more explicitly by 


(3.5) 
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Further, in the pseudo-determinant (3.3), if the indices of the 7-th row alone 
are different from those of the rest, we denote the resulting determinant by 


” ” 


(3.6) U(x; ae fF 3 te- Sie ,r”;t)™, 


” ” ° . . 
where g@ ,7”, --: ,q1 , 7” denote the indices of the 7-th row. 
Since the integral of f(x; , 22, ---+ , Z,) involves integrals of the type 


, 


(3.7) Me: ¢,1;F:0) = | x1 — 2)'F(z)e" dz, 
0 


where F(x) is a function of x such that the integral in (3.7) exists, let us first 
consider the integral (3.7). If F(x) is of the form 


ez 


z2 
20 - tz,— f t 
(3.8) | ots (1 —_ Le-1)'€ ~~ s dxp-1 we'4 i x(1 = 2)" *1 dz, 
0 0 


the integral (3.7) may be denoted by 
(3.9) L(2'3, 73 Mery 73° 5H, 7; 2b). 


Now consider I(x’; qg, r; F; t). Integrating (3.7) by parts we obtain the result 
stated in the following lemma: 
LemMa I. 


I(2';q,7; F3t) = q+rt+ 1)" {-l(2’; 9,7 + 1; F; 8) 
(3.10) + I(@’3q,7 +1, F530 + gl@’;q — 1,7; F; 8) 
+ U(z’;9q,r + 1; F; 0}, 
wher¢ 


I(x’; q,.7 + 1;F;t) = 21 — 2) F(ze*|5, Fi = Se . 


It may be noted that the right-hand side of (3.10) has been obtained by in- 
tegrating (1 — x)’** and differentiating the product of 2*/(1 — zx)*, F(x) and 
e”, treating this product as the u term in f udv. Using Lemma 1, let us consider 
the integration of the function in (3.3) when k = 2. 

THEOREM 1. The pseudo-determinant 


Ula;@,r3m,730 = (@trt1) {—-I(z;q@,r+13m,7r; 0) 
(3.11) + 21(z3;q@. +m, 2r + 1, 2t) + U(r; gq — lr3;qu,r; t) 
+ tU(@;a,7r + 1;49,7; 2} 


Proor. First, note that U(z; @, r;3qm,7r;3 0 = I(t; @,r3m,7; t) - 
I(x; 1,7; G2, 7; t). Integrate the latter integrals by parts using Lemma 1 so 
as to reduce the index gq: in each case by unity. The sum of all the terms thus 
obtained after integration gives the right-hand side of (3.11). For a more detailed 
proof of the theorem, the reader is referred to {3}. 








aad m4 
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In (3.11) the last pseudo-determinant can further be shown to be equal to the 
difference of two others given by 
(3.12) Ula;@,r+ 1iju,r30 = Ula;e,r;a,r730 — Ulaset+irsu,r;?t). 

For integration in the general case of the function contained in the pseudo- 
determinant (3.3), some more results have to be used. These results are stated 
as lemmas in the following section. For the detailed proofs of these lemmas the 
reader is referred to [3]. 

4. Certain properties of /-functions. This section is devoted to the state- 
ment of two lemmas which will be used in the next section. 

, , " 
LEMMA 2. Jf (qx, --* , G1) denotes any permutation of (qx, +--+ , q1), then 


k 
(4.1) 2 I(x; a, fs 26s Os, r,t) = I] I(x; 9, r, t), 
j=1 


where the summation >~ extends over all possible permutations. 

Lemma 3. If U(a; qe , 7”, t’3--- 391, 7”, t”) denotes the pseudo-determinant 
in (3.6) with t” for the index of the i-th row instead of t, which is the index every- 
where else, then 

1 


> (—1)*"U(2; gh, r” t”; Ts. a1 r” {”) 


"  & (— 1) "I (2; qi; os t”)U(a; Gk T° ** 5 Qi4ts 7, Qj-1,7, °°* 51,7; t). 


jak 


5. Pseudo determinant of order 3. In this section we shall prove the following 
theorem: 
TuHeEoreM 2. The pseudo-determinant 


U(a3 93,73 92,73;N,73 4) 
=(qg+rt+ 1) { —Ip(z; a,rt+1;HU(a;q,r;u,7; t) 
' + 21(2; q@3 + qe, 2r + 1; 2)L(a; Hm, 7, t) 
— — 21 (2393 + qm, 2r + 1; 21 (2; @, 71, t) 
+ @U(2;93 — l,r3;qe,73;H,7; t) 
+ U(@;%,7r+1ye,r3H,7; 0}. 


Proor. Expand the pseudo-determinant U(x; gs , 7; g2, 7; @, 7; #) as follows: 


( 93,7 / G2, Q,? : 
U 42; gas? Qiyr|;t} + U4a;3] @s,r ag 
ain g2,7 Giyt q,% y,T 
g2,7r i,t? 


Q2,7r QT }3t 
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It has to be understood that in each of the pseudo-determinants in (5.2), there 
are no elements in the positions left blank. Each component in (5.2) stands for 
the product of an element of the first column and its cofactor in the third- 
order pseudo-determinant U(x; q3 , 7; q2, 73,17; ¢). Since the order of integration 
must not be changed, the product is not written explicitly. Now using Lemma 1, 
integrate by parts the first pseudo-determinant in (5.2) with respect to x; , the 
second with respect to x, , and the third with respect to x, . Add the expressions 
obtained corresponding to each of the four terms on the right-hand side of 
(3.10). This yields 


(5.3) (a+tr+i1) (A® + B® + 4c + td”), 
where 


(5.4) A® = -—I(z;q;,r+1;0U(aia,73n,7; 0; 

(55) B® = 22) (-1)U(@; gs + a2, 2r + 1, 2b; G3 + Mm, Ar + 1,2)"; 
t=—1 

(5.6) C® = U(z;q@ —1rs@,r3u,73 03; 


and 
(5.7) D” = U(t;q,r + 134,73 H,75 ?). 


Now apply Lemma 3 to the right-hand side of (5.5) with k = 2; we at once get 
the result (5.1). 


6. Pseudo determinant of order k. We generalize the results of Theorem 2 
in the following theorem: 
THEOREM 3. The pseudo-determinant 


U(z; Qe, 7; Qe-1,73°°' 5,7; b) 


(6.1) . i 
=(¢@ +r+i1) (A~ + B® +quc” + td”), 


where 


(6.2) A® = —Ip(x;qe,r + 1; 0)U(e; Qa, rs +? 53 H,75 9); 


1 
B® =2 2 (—1)* "T(z; ae + qj, 2r + 1, 2t) 


(6.3) j 


1 
URS Quay 5+ 5 Gitns TH Gay M3 5 THD); 
(6.4) ” U(r; q@ — 1,73 Ha, Ti °° 5 N,7; 4); 
and 
(6.5) D® = U(z;qae,r +13 qa, ti °** 3 %>73 2). 


The proof of this theorem follows step by step that of Theorem 2. 

It may be noted that the pseudo-determinant in (6.5) can be expressed as a 
difference of two others,as given below: 
(6.6) U(r; qe, tt ls gear, ry ++ 5H, 750) = UG ars 5H, O 

. — U(@; ge + 1,7; Ger, 73 °** 5M, 75 LD. 
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Further noe that if g; = gj. + 1 in Theorem 3, the pseudo-determinant (6.4) 
vanishes. 

Now we state below another theorem which can be proved by employing 
techniques similar to those used to prove Theorem 3. 


THeoreM 4. Jf W(y; ax, b; ax, b; --- 3 a1, 0; —t) denotes the pseudo-determi- 
nant 


| y . | 
| yt e */(] 4 yx)” dy. pint yi! e */(1 + yx)” dy. 
0 


v2 la b 3 | 
| yi" e "7/1 + 9) dy: oT 
0 0 | 
wherea; > —1,b>a;+1,(@ = 1,2,--- 

0<nSyeS--: 


then 


Wy; ae, D3 Ger, 0; +--+ 5a, 0; 
(6.8) 


= (b-— a —1)'(P™ + Q” + ak — tS), 
where 


(6.9) P™ = —Fy(y; a,b — 1, —t)W(y; ara, db; ++ 5a, 0; —2); 
1 

610) 2 Zz F(y; a, + a; , 2b — 1, — 2¢) 

*W(y; aus, b; +++ 5 O51, 0; 51,0; +++ 50,0; — 8, 
(6.11) R® = Wy; a, — 1,6; ae1, b;--- 3a, 6; —2), 
and 
(6.12) S® = Wy; a,b — 1; a4-1, 0; -++ 5; a1, b; —2). 
The pseudo determinant (6.12) can be expressed as the sum of two others, as follows: 
(6.13) S“® = Wy; ax, b;--- ;a,,b; —t) + Wy; a, + 1,6; ---, a1, 6; —2). 


7. Applications to multivariate analysis. The results given by Theorems 3 
and 4 are useful for certain distribution problems in multivariate analysis. 
Consider the well-known distribution of the non-zero roots (0 < 6; S @ S -:: 
< 6, < 1;8 S p, the number of variates) of a determinantal equation in multi- 
variate analysis given by R. A. Fisher [1], P. L. Hsu [2] and S. N. Roy [5]. It 
can be written in the form 


p(6,, --* 6) = C(s, m, n) II 6;(1 — 6,)” Il (6; — @;) 


(7.1) i=] i>j 


0<&48-:-34<1, 
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where 


C(s, m, n) 


=" II r{(Q2m + 2n+s+i274 2)/ /TI C{(2m + i + 1)/2} 
i=l 


/ i=1l 
-T{(2n + ¢ + 1)/2}T(i/2). 


For the interpretation of m and n, see [4]. 
Now let V“’ = a 1 6;. Consider the moment generating function of V 
given by E{ exptV'}, where E denotes mathematical expectation. It is easy 


to see that, in (3.1), if we put 
(7.3) @q=m, P= hh, = 8, rt; = 0;, and 


multiply the resulting expression by C(s, m, n) given in (7.2), and integrate 
with respect to the @’s over thedomain 0 < 6; S --- S 6, < 1, we at once obtain 
E{ exp tV“}. In other words, E{ exp (V“} is obtained from the pseudo-determi- 
nant (3.2) after substitutions (7.3) and multiplication by C(s, m, n). Now 
apply Theorem 3 to E{ exp tV“’|}. We obtain 


s—l 


(m+ n+ s)E(e'”) = 2C(s, m,n) >> (—1)"*" 
j=1 


-{I(1;2m + s+ 7 — 2, 2n + 1, 2¢) 
xX U(l;m + 8s — 2,n;---;m+j,n;m+j — 2,n;--- ;m,n;t)} 


+ tC(s,m,n)UA;m+s—i1,n+1;m+ 8s — 2,n;---;m,n;2). 


The simplification here resulted from the fact that the A term (since z = 1) 
and (since gx = qx. + 1) the C term of Theorem 3 both vanish. Now in view 
of the result (6.6), 


lLna+l;m+s— 2,n;--- ;m,n;t) 
r(s 

= (1 / C(s, m, n))E{ exp tV™’} 

— U(A;sm+s,n;m+3—2,n;--- 


Further, using property 1 given in (2.3) 


UA;m+s—2,n;---;m+7,n;m+ 7 —2,n;--- 5m,n; bt) 
(7.6) il 


= (1/C(s — 2, m, n)E{(>> AH ---O sabe . 


where >> 6; --- 6,.,-1 denotes the (s — 7 — 1)-th elementary symmetric func- 
tion in (s — 2) variables 6, --- 0,» . Again using the same property in (7.5), 


(7.7) U:m+s,n:m4+s—2,n:--- :mn;t) = (1/C(s, m,n))E(V' e'*’) 
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Now make use of (7.5) to (7.7) in (7.4) and we get 


(m+n+s— d)E(e”) + tE(Ve'"”) = {2C(s, m,n)/C(s — 2,m,n)} 
(7.8) s—l 


2 


=1)'1(1;2m + 8+ F— 22+ L WEL H+ sae}, 


jel 


To illustrate the use of (7.8), let us put s = 2. This yields 


tv(2) @ 
e 


)+ tE(V™ 


OVD ») ev?) 
(m+n —t+ 2)E( e ) 
(7.9) 


= 2C(2, m, n)I(1; 2m + 1, 2n + 1, 2%). 


Noting that 7(1; 2m + 1, 2n + 1, 2¢) is a confluent hypergeometric function 
which can be expanded as a power series in 2t, and that exp tV™ also can be 
expanded as a power series in t, equating the coefficients of like powers of ¢ 
on both sides of (7.9) yields 

(m+ n+it+ 2)u;” — ip = 2°(2m + 2) --- (Qm +74 1) 
(7.10) 

(m + n + 2)/(2m + 2n + 4)--- (2m + 2n+74+3) ((=1,2,--- ), 


4(2) J ‘ ° , ° 
where »; | denotes the 7-th raw moment for 2 roots. After successive substitutions 
of lower order moments given by the respective recurrence relations (7.10), we 
get 


_ (m+n + 2)PG + ITQm + 2n + 4) > ot-i3 


~ TD(Qm + 2)P(m + n + 3 + 1) j=l 
T(Qm + i —j+3)P(m+n+i—j+3) 
T(2m + 2n +74 —j + 5)T — 7 + 2) 
Computations of a similar nature with s = 3 and 4 in (7.9) and further evaluation 
of the central moments have yielded the following results [3]: 





(7.12) uw = 832m +84+1)/2Am+n+84+1) (8s = 1,2, 3, 4), 
po = s(2m + 8 + 1)(2n + 8 + 1) 
(7.13) (2m + 2n + s + 2)/4(m+n+8+4 1) 
(m + n+ 8 + 2)(2m + 2n + 28 + 1) 
and 
(7.14) ys” = s(n — m)(2m + s + 1)(2n +84 1) 
(m + n + 1)(2m + 2n + 8 + 2)/d, 


where 
d= (m+n+s84+1)(m+n+s8+4+ 2)(m+n+8 +4 3) 
(2m + 2n + 2s)(2m + 2n + 2s + 1), 


For the corresponding yu”, the reader is referred to [3]. 
In addition to the usefulness of Theorem 3 in studying the moments of the 
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sum of the roots as outlined above, this theorem is also useful for evaluating the 
cumulative distribution function of the largest root, 6, , of the determinantal 
equation. For the latter purpose, in Theorem 3, multiply 


U(x; de, 73 Mera,» 73 °° 5% 57; 2b) 


by C(s, m, n) given in (7.2), after making the following substitutions: 


(7.15) gj =m+j-1, r= N, C= 8, ti = 6;, and t = Q. 


In this case, the C term of Theorem 3 alone vanishes. By means of Theorem 3, 

we reduce the cumulative distribution function involving a pseudo-determinant 

of order s in terms of those of orders (s — 1) and (s — 2). Since it has been 

shown [3] that the cdf of the smallest root can be obtained from that of the 

largest, Theorem 3 is thus useful in obtaining the cdf of either of these roots. 
Again if we wish to study the moments of the criterion 


U” = >i 6,/(1 — 4), 


by using Theorem 4, we will arrive at the following result: 


(n’ ana S + E(e"“”’) 4 tE(Ue 7) 


s—l 
= {2k(s, m, n’)/K(s — 2, m, n’)} >, (—1)""" 


j=1 
F(o;2m + 8+ 7 — 2,2n — 1, — 2t) 
E{(D Mes Mae}, 


where A; = 6;/(1 — 6,), which transformation in (7.1) gives K(s, m, n’) from 
C(s, m,n) andn’ = m+n+s+4+1. 

For a detailed study of these applications in multivariate analysis, the reader 
is referred to [3]. 
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THE NONEXISTENCE OF CERTAIN STATISTICAL PROCEDURES IN 
NONPARAMETRIC PROBLEMS' 


By R. R. BAHADUR AND LEONARD J. SAVAGE 
The University of Chicago 


1. Introduction. It seems plausible that if the population distribution of a real 
random variable is entirely unknown, then a sample from the population can 
yield little or no information about the tails of the distribution, even if the 
sample is obtained according to a sequential procedure. This paper gives evi- 
dence supporting and clarifying this proposition. 

The paper treats in some detail problems of inference concerning the popu- 
lation mean yz. It is shown that there is neither an effective test of the hypothesis 
that » = 0, nor an effective confidence interval for yz, nor an effective point 
estimate of u. These conclusions concerning u flow from the fact that yu is sensitive 
to the tails of the population distribution; parallel conclusions hold for other 
sensitive parameters, and they can be established by the same methods as are 
here used for yw. 

It is also shown that there exists no confidence band for the population dis- 
tribution function such that the upper and lower limits of the band are them- 
selves distribution functions; that is, no confidence band fits very well. 


2. Theorems. Let be a given set of distribution functions F, G, ... of a real 
variable. Some of the theorems to be proved would be of interest even if § were 
required to be the class of all distributions or perhaps all distributions F with 
finite mean yup. But it is helpful to recognize that the proofs require only that 5 
have a certain richness. Specifically, Theorem 1 and Corollaries 1 through 4 
depend on the following three hypotheses: 


(i) For every F eS, ur = §*,, 2 dF exists and is finite. 
(ii) For every real m, there is an F ¢ § with pry = m. 
(iii) ¥ is convex; that is, if F ¢e 5, G ¢ 5, ris a positive fraction, and H = 2F + 
(1 — w)G, then H ¢€G. 


Theorem 2 depends on hypotheses (iii) and the following: 


(iv) ¥ is closed under translation; that is, if F ¢ F, and G(z) = F(z — h) for all 
z and some h, then G e &. 
(v) F is nonvacuous. 


Some obvious examples of sets satisfying all four conditions are the sets of all 
distribution functions F sueh that yy, is finite; the points of increase of F are a 
bounded set, or are a finite set; F is absolutely continuous and dF’/dz vanishes 
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outside of a bounded interval; as z approaches «, | — F(z) + F(— z) = O(2") 
foranr > 1. 
Since the theorems to be proved are theorems of nonexistence, it is appropriate 


that they be stated and proved for mixed (i.e., randomized) procedures—sampl- 
ing, estimating, testing. They are, of course, true a fortiori for the smaller class of 
pure procedures. The technique of working with mixed procedures is presented in 
detail in certain publications, for example [1] and [2]. We feel free, therefore, to 
handle mixed procedures rather informally, to save space and tedium. 

Let X, , X2, --- denote an infinite sequence of independent random variables, 
each distributed according to F; that is, Pr(X; < z) = F(z). Suppose that a 
(randomized, sequential) sampling procedure is given, that is, a set of rules for 
observing X,, X2,--- one by one up to a certain stage N such that at each 
stage the decision whether to continue depends (randomly) on the observed 
values in hand at that stage. The given procedure, which will remain fixed 
throughout the discussion, is naturally assumed to be closed, that is, 


(1) PN < ~)=1 


for each F ¢ F. Except for this condition, the sampling procedure is arbitrary. 

Denote the total outcome of the sampling procedure, regarded as a random 
variable, by V, that is, V = (X,, X2,--- , Xw). As already exemplified in (1), 
for any event A defined on the sample space of V, P»(A) will denote the proba- 
bility of A when F obtains, that is to say, when each X; is distributed according 
to F. If ¢ is a real valued function of V, Ep{g] will denote the expected value of ¢ 
(if it exists) when F obtains. 

For any real number m, let F,, denote the set of all F ¢ F with ur = m. 

THEOREM 1. For each bounded real valued function ¢ on the sample space of V, 
infres,, Erle| and supe.s,, Erle] are independent of m. 

The proofs of this theorem and of Theorem 2 below are postponed to the next 
section. Theorem 1 states, in effect, that even if ur is known to equal one of two 
given values m and m,, the sample V cannot provide effective discrimination 
between the two hypothetical values. The following Corollaries 1 through 4 
exploit the close relations between discrimination, testing, and estimation to 
make explicit some consequences of Theorem 1 in problems of inference con- 
cerning ur. As was mentioned in the introduction, analogues of Theorem 1 
(and therewith of Corollaries 1 through 4) are valid for parameters other than the 
mean, and these analogues can be proved by the same method as is used in the 
next section to prove Theorem 1. 

Let H be the hypothesis that ur = 0 (i.e., F € So). For any test t, let Br(t) de- 
note the probability of rejecting H in using t when F obtains, in short, the 
power function of t. Call ¢ a somewhere unbiased level-a test if 8r S a for F ¢ Fo 
and, for some m different from zero, By = a for F ¢ &,, . Call ¢ a similar level-a 
test if By = a for each F ¢ 5. 

Taking ¢(V) to be the probability prescribed by ¢ of rejecting H on observa- 
tion of V yields this corollary. 





CERTAIN STATISTICAL PROCEDURES L117 


Coro.iary 1. Jf t is a somewhere unbiased level-a test of H, or a similar level-a 
test of H, then 8r(t) = a forall F ¢ &. 

Corollary 1 asserts the failure, in certain senses, of all tests of the value of y, 
assuming that u exists. It would be interesting to know whether, in comparable 
nonparametric situations, tests of the existence of uw are equally unsuccessful. 
To be precise, suppose for example that F is the set of all distribution functions. 
Let 5* be the subset of $ on which yu, exists finitely, and let H* denote the hypo- 
thesis that F is in §*. Then, does Corollary 1 hold with H replaced by H* and 
“somewhere unbiased” replaced by ‘“‘unbiased’’? 

Next, let J be a confidence set for uy , that is, J is a (randomized) function of 
V, that has Borel subsets of the real line for its values. For any real m, let C{m] 
denote the event that J covers m. 

Coro.uary 2. If Pp(Clur|) 2 1 — @ for all F ¢ F, then Pe(C{m]) = 1 — a 
for all m and all F € &. 

Proor. For each m, let p,,(V) be the conditional probability of C[m] given V, 
0 < p & 1. Consider a fixed m. By hypothesis, Ey[p,,] 2 1 — a for F ¢ Fm». Hence, 
P,(C\m|) = Eslpnl 2 1 — @ for all F ¢ ¥, by Theorem 1. Since m is arbitrary, 
the corollary is proved 

CoRoLiary 3. Suppose that there exists at least one F ¢ ¥ such that Pp(I is a set 
bounded from below) = 1. Then infpex |Pr(Clur})} = 0. 

Proor. For each n = 1, 2, --- , let B, denote the event that J is contained in 
the interval [—n, «), and let B, denote the complement of B, . For each n, let 
qn(V) denote the probability of B, given V;0 S qn S quar S 1. 

Now let F be a distribution in ¥ such that J is bounded from below with 


probability 1 when F obtains. By Lebesgue’s theorem for monotone sequences, 
Ey\lim, gn} = lim, Erl{q,| = lim, Pr(B,) = Pr(I is bounded from below) = 1. 
Consequently, lim, g,(V) = 1 except on a set of points V of Pr-measure zero. 
Since, for any m < —n, p»(V) = Pr(m eI | V) S Pr(B,| V) = 1 — q,(V), it 
follows that, except on a P,-null set, 


(2) lim pma(V) = 0. 

Since P,»(C{m]) = E,\|p.| for all m, it follows from (2), by Lebesgue’s theorem 
for boundedly convergent sequences, that 
(3) lim Pr(C[m]) = 0. 

Now, Corollary 2 states in effect that infes {Po(Clue])} = infes 
inf,, {P¢(C{[m])}. It follows from (3) that the common value of these infima is 
zero. This completes the proof. 

Of course, “set bounded from above,” and, a fortiori, “bounded set’”’ can be 
substituted for “set bounded from below” in the statement of Corollary 3. But 
the following example shows that it would not be enough to say “set bounded 
from above or from below.” For all V, let J = (— , 0] with probability 4 and 
I = (0, «) with probability 4; then Pe(C{m]) = 4 for all m and all F. 
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Next, consider the problem of constructing a suitable point estimator for ur . 
Let M be an estimator, that is, a real valued (randomized) function of V. Sup- 
pose that when F obtains, the expected loss in using M is Ep[L(M — ur)| = 
re(M), where L(m) is bounded from below and lim,,,, L(m) = © or lim,._, 
L(m) = @ (e.g., L(m) = | m|, L(m) = m’, L(m) = (2 + sin m)e”). 

Let p(F) be a real valued functional on F. Say that p is uncontrollable (from 
above) if there exists no real valued (randomized) function of V, say S, such that 
infpes {Pr(p(F) < S)} > 0. 

The following corollary shows that there is no estimator M for which the 
expected loss rp(/) is bounded in F, nor even one for which the sample gives 
any clue as to the possible expected loss. 

Corouuary 4. For any estimator M, rpe(M) is uncontrollable. 

Proor. There is no loss in generality in assuming that lim,., L(m) = o. 
Replacing L(m) by L(m) — inf, L(a), there is also no loss in assuming L non- 
negative, with inf, L(m) = 0. Consider a fixed estimator M. Write Ly, = 
L(M — up). Since Ly = 0, it is easily seen (a la Tchebycheff) by considering the 
eases re = 0, 0 < re < ~, and rp = © separately that Pr(Lr S arr) = 1 
— (1/a) for all a > 0 and all F. 

Suppose, contrary to Corollary 4, that there exists a random variable S with 
distribution determined by V, and a positive constant 8, such that Pp(rr < S) = 
8 for all F e &. There is no loss of generality in assuming that S is always positive. 
Choose and fix an a > 0 such that 8 — (1/a) > 0. Let Y = sup {m:L(m) S aS} 
and define J to be the random interval [M — Y, «). Then P;(J is bounded from 
below) = 1 for each F’. Also, for each F e &, 


Py(Clur]}) = Pr(M — ur S Y) 


Pp(Lr aS) 


P,(Ly < aS, rr < S) 

P,(Lp Ore, Tr < S) 

P,(Ly S ary) + Pr(ry < S) — 1 
1—(1/a)+8-—1 

0. 


This contradiction to Corollary 3 establishes Corollary 4. 

The preceding proof consists in showing that if M is an estimator such that 
rr(M) is controllable, then yr is controllable, contrary to Corollary 3. This 
argument can also be used to show the uncontrollability of certain parameters. 
Simple examples of such parameters are the variance of F, the difference between 
the mean and median values of F, and the supremum of the points of increase of 
F. Note that while the unboundedness of these parameters is evident when 
assumptions such as (iii) and (iv) hold, verification that they are uncontrollable 
is less trivial even in the case when V consists of a single observation. 





CERTAIN STATISTICAL PROCEDURES 1119 


Finally, let A(z) be a (randomized) function of V taking values in the set of all 
distribution functions of z. Let C*[F] denote the event that A(z) = F(z) for 
all z. 

THEOREM 2. infy-g {Pr(C*[F])} = 0. 

Application of Theorem 2 to —X;, yields with little effort a similar theorem, 
dual to Theorem 2, concerning the probability that A(z) s F(z) for all z. 
Obviously these two theorems together imply a two-sided version of Theorem 2. 


3. Proofs of the theorems. The proofs of the theorems depend on the fact that 
a given distribution function F can be so modified that, while the probability 
distribution of the X,’s (and therewith of V) is perturbed only slightly, para- 
meters such as the mean suffer arbitrary displacements. This modification is 
described in the following paragraphs, before undertaking the proofs of Theorems 
1 and 2. 

Let & denote the class of all functions ¢ of V with 0 S ¢ S 1, and (for any 
two distribution functions F and G@) define the familiar absolute-variational 
distance between F and G by 


(4) i(F,G) = sup | Erle] — Ele] | . 


Given F, let H be an arbitrary distribution function and 7 an arbitrary con- 
stant, 0 <  < 1, and define the distribution function G thus: 


(5) G(z) = xF(z) + (1 — w)H(z). 


The following lemma shows that if the given sampling procedure is closed for 
F in the sense of (1), and if x is sufficiently close to 1, then, no matter what H 
may be, the probability distributions of V under F and G are not very distant 
from one another. It may clarify the meaning and proof of the lemma to remark 
that it is for this application of the lemma, not for the lemma itself, that the 
sampling procedure must be closed for F. 

Lemma. 6(F,G) S 1 — xP,(N < k) for each positive integer k.’ 

Proor. Choose and fix a positive integer k. Let R“™ denote the space of all 
points z” = (4 ,2,°°:, 2) With —wo <2; < © fori = 1,2, ---,k. For any 
univariate distribution function F(z), write F“(z™) = T]%.1 F(z,). 

It will be shown first that if F and G are related according to (5), then, for any 
nonnegative function f on R®, 


(6) fag” > s° f dF™. 
R(e) R(*) 
To verify this inequality, let Y:, Y2,---, Ye, Z:, Z2,-°::, Ze, and Uj, 
U,,--- , Us be independent random variables such that each Y; is distributed 
according to F, each Z;, according to H, and P(U; = 1) = 1 — P(U; = 0) = 
2 These inequalities are a considerable improvement of the corresponding ones in an 


earlier version of the lemma, and the proof is somewhat similar. The authors are indebted 
to Professor W. Hoeffding for these improvements. 
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for each U;. Write W; = UiY; + (1 — U;)Z; fori = 1, 2,---,k. Then W,, 
W.,---, W, are independent random variables, each distributed according to 
G as defined by (5). Let B denote the event that U; = 1 for alli = 1,2,--- ,/ 
and let B denote the complement of B. Now it is straightforward to show (6); 
thus, 


= El f(Wi,---, We) 


- P(B)E({(Wi, ---, Wi) | B) + P(B)E(S(Wi, ---, W.) | BI 
> P(B)E|{(Wi, ---, We) | Bl 

w E(f(¥i,---, Ye) | B 

w E(f(¥1,---, Ye) 


a! faF”. 
J Re) 


Consider the space of all sequences X*~’ = (X,, X2, --- ad inf). Since V, the 
observed sample, is by definition a (randomized) function of X™ it makes sense 
to speak of the conditional © ‘\stribution of V given X™ = (X:,--+, Xs). It 
follows from a well known property of conditional expectation that, for any 
function h(V) and any F, 


(8) Eph = | Erlh|X”) dF, 
J p(k) 
provided that L,y|h]} exists. 
Next, let ¢ be a function of V such that 0 S ¢ S 1. Definey(V) = 1lifN sk 
and ¥(V) = Oif N > k. It is easy to see that there exists a function f on R“ 
such that 0 = f = 1, and 


(9) Erly-y |X) = f(x), 


for all F. The function f depends, of course, on the given ¢ and the given sampling 
procedure. 

Suppose, now, that F and G are two distribution functions related according 
to (5). Then, 


Egle) x Egle 7 ¥) 


| Ede-v |X“) aG® by (8) 


Rr‘) 


| fdaG° by (9) 
R(*) 


> x / far by (6) 


R\ 
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rf 
rT 
(k 


r Erle-¥) 
= Erle] — Erle(l — xy) 
= E,ly) — Ep{l — xy) 
E,lgl — 1+ 2“P,(N < k). 


Erle-v | 77) dF™ 
) 


Thus, 

(10) Erle] — Eole] S 1 — "P(N S k) 

for all g in &. Since ¢ ¢ @ implies 1 — ¢ € , it follows from (10) that 
(11) —E,\¢| + Edy] S 1 — 2‘P(N S k) 


for all ¢ in ®. In view of (10), (11), and the definition (4) of 6, 6(F, G@) S$ 1 — 
x P,(N < k) for all k, as was to be proved. 

Proor OF THEOREM 1. Let m and m’ be real numbers, and let « > 0 be given. 
Consider a fixed F in S,,. Choose and fix a positive integer k such that 


(12) P(N >k) <e. 


The existence of such a k is assured by (1). Now choose and fix a such that 
0 <x < land 


(13) (l— x) <«. 


Let H be a distribution function in § such that rm + (1 — w)ux = m’ (see 
assumption (ii)), and let G be defined by (5). Then, by assumption (iii), @ is in 
$, and since ug = tur + (1 — w)ug = m’,Gisin§, .Sincel — xP,(N Sk) S 
(1 — x') + P,(N > k), it follows from (12), (13), and the Lemma that 6(F, G) < 
2. 

Since ¢« and F are arbitrary, infes,,. [6(F, G)} = 0 for each F ¢ §&,, . In other 
words, F,," is everywhere dense in $,, , under the metric 6. Since m and m’ are 
arbitrary, it follows (see assumption (i)) that, for each m, F,, is everywhere- dense 
in ¥. This conclusion, together with the observation that E,[¢] is continuous in F 
for any bounded ¢, yields Theorem 1. 

Proor or THEOREM 2. Before the proof proper we present a line of argument 
that may be of some interest in suggesting a heuristic connection between this 
theorem and Theorem 1, though this line of argument makes assumptions that 
are actually gratuitous. It assumes in fact that (i) obtains and that, for some 
F, the mean of A almost always exists and is finite. 

Suppose, then, that the random distribution function A is such that, for some 
F e§; 


(14) -«</ sah <a 
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except for a P»-null event. Define J] = [f*, z dA, ©) whenever (14) is satisfied, 
and J = (—«, «) otherwise. Now, the event A(z) = F(z) for all z (that is, 
C*[F]) implies the event © > ur = f®,,2dA (that is, C[ur]) provided only that 
ur exists and is finite. Hence Pp(C*([F]) S Pr(Clur]) for each F ¢ §. The desired 
conclusion now follows from Corollary 3. 

Now, dropping the assumptions (i) and (14), turn to the proof proper. Choose 
and fix ane, 0 < e€ < 1, and an F ¢ § such that F(0) > 0. The existence of 
such an F is assured by assumptions (iv) and (v). For each z, let J(z) = inf 
{u:Pr(A(z) S u) 2 1 — e¢}. It is not difficult to see that J is a nondecreasing 
function of z, with lim,._, J(z) = 0, lim,., J/(z) = 1, and that J is also con- 
tinuous from the right, so that it is actually a distribution function. Also, 


(15) Py{A(z) > J(z)} Se 
for each z. 

Now choose k such that (12) holds, choose x such that (13) holds, and choose 
d such that J(A) < (1 — w)F(0). Let G be defined by (5), with H(z) = F(z — X). 
Then G is in §, by assumptions (iii) and (iv), and 
(16) J(A) < GQ), 
by the choice of \ and the definition of G. Hence 


(17) P,(C*(G]) = Pe(A(z) = G(z) for all z) 


Po(A(d) = G(A)) 
Pg(A(a) > J(A)) by (16) 
P,(A(d) > J(A)) + Pe(N > k) + (1 — 2) 

by the lemma 


S 3e by (12), (13), (15). 
Since ¢ is an arbitrary positive fraction, the theorem is proved. 
The proof of Theorem 2 does not use quite the full force of (iii) and (iv). 
It is enough that for some F ¢ § and two sequences of numbers a,(0 S a; < 1) 
and 8; , such that a; — 1 and 8; — , the distributions G;; such that 


(18) Gi(z) = aF(z) + (1 — a)F(z + B85) 
are in ¥. For the dual of Theorem 2, it is required instead that 8; ~ —. 
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ASYMPTOTIC DISTRIBUTIONS OF TWO GOODNESS OF FIT CRITERIA 


By Patrick BILLINGSLEY 
Washington, D. C. 


1. Results. Let {X,, X2, ... } be a stochastic process in which each random 
variable takes as values only the integers 1, 2, --- , s. To test the null hypothesis 
that the process is independent and stationary with P{X, = k} = p, > 0, it 
is natural to form the statistic 


(1.1) > Marne = Pur ++ Pur) 


Uj.t+*,Uy=l NPu, *** Pu, 


’ 


where n,,..-u, is the number of integers m S n for which (X,,, --+ , Xmis—1) 
is the y-tuple (uw , --- , u,). In Section 2 we show that under the null hypothesis 
the distribution function of (1.1) approaches, as n — , the distribution func- 


tion 


v—1 


(1.2) * K yt ¢o-1)2(4/d) #Ka(x/r) , 


where K,(xz) is the chi-square distribution with 7 degrees of freedom and the 
first * denotes iterated convolution in the obvious way. Good [1], using different 
methods, has obtained this result for the special case in which the p, are all 
equal and s is a prime number. 

If the p, are estimated by n,/n, there results the statistic 


(1.3) (Muy-uy — "May *** Ma) 
: Uy, Uy nna, +? Tie 


In Section 3 we show that under the hypothesis that {X,} is stationary and 
independent, the distribution function of (1.3) approaches, as n — ©, the 
distribution function 


y—1 


(1.4) Poe K yo-1-2¢9-1)2(2/A) ° 
In the special case v = Z this result is implicit in the work of Hoel [2]. Note 
that in this case (1.4) becomes K ,,_,,2(Z). 

The means and variances of the distributions (1.2) and (1.4) are easily written 
down. It is obvious that if v is fixed and s — «, then these distributions are, 
when normed by their means and standard deviations, asymptotically normal. 
It is a simple matter to show, using Ljapunov’s condition and the fact that the 
distributions are convolutions, that the same thing is true if s is fixed and 
y — oo, By interpolation in the tables of [3] one can get an approximation to 
(1.2) for the case s = 2 and vy = 2 and an approximation to (1.4) for the case 
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s = 2and v = 3. The paper [3] also deals with the general problem of computing 
and approximating the distributions of weighted sums of independent chi- 
square-distributed random variables. 

The author wishes to thank E. H. Spanier for some essential ideas on the 
algebraic aspects of this work. 

In his paper ‘On the serial test for random sequences,” forthcoming in this 
jcurnal, I. J. Good shows that the expected value of the statistic (1.1) and the 
first moment of its limiting distribution (1.2) have the same value, viz., s” — 1. 


2. Asymptotic distribution of (1.1). In what follows we make use of the theory 
of finite dimensional vector spaces. The reader is referred to [4] for the notions 
of direct sum (donated here by v), spectral decomposition, etc. We denote an 
operator and the matrix which represents it by the same symbol. 

Let UV, be s’-dimensional Euclidean space with components indexed by the 
s” p-tuples (m,--- , u,) with 1 S u; S s. Let 


7 ( 1/2 
(2.1) eee 2 eR ae 
and let x be the random vector in VU, with components 
2 1/2 
Luyeseu, = ae —— Rhu ,---n,)/% Pu,---u, 


Then |z|° is the statistic (1.1). Let M be the s” by s” matrix with entries de- 
fined by 


. 
M. o**lby, DEO By ” ae "a Bus. He ag Puy---u,Pvy---v, 


For this matrix to be well defined, the v-tuples must be ordered. Which order 
is taken is immaterial so long as it is kept constant throughout the argument. 

We show first of all that the covariance matrix of x is asymptotically A 
where A” is the s” by s” matrix with entries defined by 


(y) 


; 


v—1 
(¥) (¥) (¥—k) 
Mba cece, = Di aesttha* te + 2. Riin-ovtntlltesctne*+ tell tadin us+tedl 
k=l 


(2.2) 


v—1 


4 i “?. 
Pu,_eyit+*u,Pr;-- -oyll Uy—k.Vk+ 1° °° Py 
k=1 


Let a,{8;] be 1 or 0 according as (X;, --- , Xi4,-1) is the »-tuple (m, --- , u,) 
[(v., «+: , v»)] or not. Then n,,.. = = > in a; and n,.. = Dim B;. Let 
c(t, Jj) = cov (a;, 8;). Then c(i, a O0if|i-—j2v and c(i, t + k) is inde- 
pendent of 7. Hence, since |c(7, 7)| S 2, we have 


mpetng-sal * Ae Bund 
Din c(i, t) + Da Di (eli, i + k) + c(i + &, i) 
Doimn—r42 Dotan—i4i (C(t, i + kK) + c(i + k, 4) 
= nfe(1, 1) + Dori (e(1, 1 + k) + c(1 + &, 1))] + 260° 


COV (ny,.. 
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with |6| < 1. Hence in the limit cov(nu,-+-u,,%,°**»,) i 


sa (P) 
fm As,.--ann---, ™ 


’ 
nwo 


v—1 
Pare, Perv iel, 1) + dX (c(l, 1 + k) + c(l + k, 1))). 
1 - 


But fork = 1,---,v—1, 


e(1, | + k) sg Suny it yy bu,,0,-2Puy ae Pu,Po,—e+1 ee Pro, =" Rectal 


From this expression and similar ones for c(1, 1) and c(1 + k, 1), (2.2) follows. 
It is an immediate consequence of the multivariate central limit theorem 
for v-dependent random variables [5] that the distribution of z approaches that 
normal distribution having zero means and having A” as covariance matrix. 
We proceed now to find the spectral decomposition of A”. For v > 1 let 
£, be the set of ¢ ¢ VU, satisfying 


(2.3) i ae a 
and 


(2.4) Fe Msi 2 i 
1 


uy 


for all (uw, --- , u,). It follows from (2.3) and (2.4) that fork = 1, ---,» and 
Sd, «+ bd 


’ 


(2.5) 2 Pus---ug busy---u, = 7 Prusy--- tsp bess g pee ep teye etsy . 
Uyrr* Uy Urry 


Let £; be the set of ¢ € VU; for which >-.puty = 0 and let Lo consist of the number 
0 alone. For v 2 1, define a linear mapping 11,:£, — £,1 by (IL4)u,..-4,., = 
DouPutun,---u,_,- That t e £, implies It ¢ £,-; can be verified by computation. 
For v = 2, define a second linear mapping Q,_1: £,_1 — &, by 


(BB) Gidder--a,. © Podus::-a, F Badee-teca ~ Roeellle-Pew ws * 


If » = 2 the last term in (2.6) is to be omitted. Again a computation shows that 
Q,_it ¢ £, if t e £1. From these definitions it follows that 


(2.7) Quit = ¢, fe dus. 
Let £° be the set of ¢ ¢ £, such that 0,¢ = 0. Then 
(2.8) £, = £3 v2,_1(£,-1). 


In fact if te £,, then ¢ = (t — Q,I1,t) + Q,111t, while t — 0,,1,t ¢ £? and 
Q,_sT1,t ¢ 2,_1(£,_1). And ift ¢ £°N2,_1(£,_1), then ¢ = Qt’ andO = 11,9,_.l’ = ¢’ 
sot = 0. 
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If te £, then Mt = ¢. From this and (2.5) it follows that 
(y—k 
Z Po,—kgice? ..- UP 1p ak toy ee, 
©4°°*Sy 


= 7 pO See ie eine = (T1241 eee | ee 


Vy" *Py ek 


Using this relation and the symmetric one, we get 


i. slits Raa: rote “+ > Puy ~ up (Tres iex |) 
(2.9) 


+ ie Puy—ag ese, (Leys oon 
k=1 
forte £,. If vy => 3andt ¢ L,,, then by (2.7) and (2.9) we have 


vy—1 


((A”9,_1 — una ene, = Pu,l g***t, + aX Puy--+ uy (Tl,-e41 oer a rn 


en ee > ii laa ak os 
From this, using (2.9) again, one shows by a long but straightforward calcula- 
tion that forv = 3 
(2.10) s"2,... ~ 2. = 64.4" 
on £,.1. 
We next show that for »y = 2 


(2.11) AM = T+ : Qs -*+ Gish --- OL 


on £,. The proof goes by induction. The verification being simple for » = 2, 
assume (2.11) holds with » replaced by » — 1. Then by (2.10) and (2.7) we 
have 


A” 0,1 = (7 + ae Q1 e+ Qill --- i) 2-1. 


k=2 


In other words, it follows that (2.11) holds on Q,_;(£,_;). Since it obviously 
holds on £}, it follows by (2.8) that (2.11) holds on all of £, . 
Let Mm, = £° and ford = 2,---,v let 


‘ ‘ 0 
(2.12) NT = e {), ae ere 0541, L941-r ° 


It follows from (2.8) by induction the at £, = Mw --- vo,. Using (2.11) one 
easily shows that for A = 1, --- , », 


(2.13) At = if tem. 


Let o ¢ VU, be the vector whose (wu, --- , u,)-th component is px,...u, and let 
a(¥:, ***, U1) eV, be the vector whose (uw, ---, u,)-th component is 
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PuySug0; °** Suye,-1 — Pu,Ous, °** Su,_1.2,-,- Let I> be the manifold generated 
by o and the s”” vectors o(v1, --- , v1). By definition £, is the orthogonal 
complement of 3% , so that 


(2.14) UO, = MovMiv---vM. 


Direct computations show that Ao = A o(n,, --+ , v1) = 0, so that (2.13) 
holds for \ = 0. Thus each 9% , 0 S A S », consists of eigenvectors with eigen- 
value \. These are all the invariant subspaces, in view of (2.14). 

We now compute the dimension of £?. It is easy to show that dim £{ = 
s — 1. Suppose v = 2. To say that ¢ ¢ £? is to say that for all m, --- , w—1 


(2.15) =. Patens*** te = ps Pulbus---u,38 = 0. 
Let X be the s”” by s” matrix with entries 


Xp +-tpas.1++ 0 wa Pv uy v2 eee bu,_19, 


and let Y be the s”” by s” matrix with entries 


e 
Vouy---upo1.91°°-0e = Pv,Pusw, °** bu, 1.0721 . 


-(F 


is the matrix of the system (2.15), i.e., ¢ lies in £° if and only if t is orthogonal 
to each row of Z. In order to find the (column) rank of Z let A and B be column 
vectors with s”’ components A,,...u,_, and B,,...u,_, respectively, and let C 


be the partitioned vector 
‘ie 
c=[4]. 


Now C is orthogonal to each column of Z if and only if py,Au...u, = 


The partitioned matrix 


—pu,Bu,---u,-,- Thus if for a set {D,,...u,_,} of 8” numbers we let Ax,...u, = 
Pu,Dus---u,-, aNd By,..-u,. = —PuzDug---u,-,, then C is orthogonal to the 
columns of Z. Conversely, if C is orthogonal to these columns, it can be cast 
in this form. Hence the subspace of 2s” ’-dimensional space orthogonal to the 
subspace generated by the columns of Z has dimension s’*. Therefore Z has 
rank 2s”? — s”* and dim £? = s”*(s — 1)’. It now follows by (2.12) and (2.14) 
that dim 9% = s”", dimm, = s — 1, and dima = 8s” (s — 1)? forA = 
i, e** = 2. 

We now have the dimensions of the invariant subspaces and hence the multi- 
plicities of the eigenvalues of A”. Since the distribution of z is asymptotically 
normal with covariance matrix, A”’, it follows by an obvious generalization of 
the result of Section 24.5 of [6] that the distribution of |z|’, or (1.1), approaches 
(1.2), under the null hypothesis. 


bead. 3 
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3. Asymptotic distribution of (1.3). We assume now that {X,} is independent 
and stationary, but we regard the p, as unknown. In fact, let pi, «++ , ps1 
be parameters to be estimated, define p, by p, = 1 — Dcizi pe, and let 
pu,---u, be a function of p;, --+ , p,-1 defined by (2.1). 

Now it is easy to show that the values p, which maximize 


II (puj--+u,)!ro'* 


Ujyrrt Uy, 


are Px» = m/n + & , Where e = O(1 /n). And now the reasoning of Section 
30.3 of [6] becomes applicable. Let B be the s’ by s — 1 matrix with entries 
Buy. --usu = Pur---u,OPu,---u, / Pu, 1 S us S 8,1 S u < 8. Let y be the random 
vector which results from substituting the estimate n, /n for p, in z. Then 
\y|° is (1.3). In order that the theorem of Section 30.3 of [6] be directly applicable 
it would be necessary that {7,,...u,} be a sample from a multinomial universe. 
However, since the z defined above is asymptotically normal with covariance 
matrix A”, since B has rank s — 1 and since 2,,...., = 0(n'*) in probability 
(as is easily shown), a perusal of the proof of the theorem referred to shows 
that we are in the present case justified in concluding that the distribution of y 
approaches that normal ‘distribution with zero means and covariance matrix 
AA” A’, where A = I — B(B’B)'B’. 

We now find the spectral decomposition of AA” A’. Let K be the s by s — 1 
matrix with entries K,,, where K,, = 6, ifu < sand K,, = —1. Let J 
be the s’ by s matrix with entries 


J = Di ry 
Uy Uy, u — G1 Puyes ug yugy su, Puy uPu - 


Then B = JK. 

If t ¢ £, it follows from (2.5) that (J’t), = vp. (Il, --- IL¢),. From this it 
follows that J’t = 0 forte IMiv--- v M1. If o(vy,, --- , v,-1) is defined as in 
Section 2, then, as a direct computation shows, J’e(v, , --- , %-1) = 0. More- 
over, (J’c), = v, so that BY = 0. Hence B’t = 0 fort e Mov --: VIN... Now 
the matrix A is symmetric and idempotent, so that, viewed as an operator, 
it is a perpendicular projection on the manifold MN = {t:At = t}] = 
{t:B(B’B)"B’t = 0}. It is easy to show that the rank of B(B’B)"B’ is the 
same as that of B, viz., s — 1. Hence dim = s’ — s + 1. We have shown 
that Ito Vv --: v M,1 CN, and since dim (Mov --- VIM) = 8 — s+] 
(ef. Section 2), we have Io v --- Vv . = NR. The manifolds IM, , being the 
invariant spaces of the symmetric matrix A”, are mutually orthogonal. Hence 
Mt, is the orthogonal complement of XX and At = O for tem,. Therefore 
AA” A't = dt if te M with 1 S A < »v, while AA” A’ = O if te MovM,. 
Finally dim mM, = s”*(s — 1) for\ = 1, ---,v — land dimotmvM, = 
si +s—1. 

Thus we have the eigenvalues of AA” A’, with their multiplicities, and it 
follows as in Section 2 that the distribution of |y|’, or (1.3), approaches (1.4). 
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THE SURPRISE INDEX FOR THE MULTIVARIATE 
NORMAL DISTRIBUTION 


By I. J. Goop 


1. The surprise index and its generalisations. Let E,, E., E;,... bea 
natural classification into a finite or countably infinite number of possible 
mutally exclusive and exhaustive results of some experiment or observation, and 
let P(E; | H) = p(t = 1, 2, 3, ---), where H is a simple statistical hypothesis. 
Then the surprise index (Weaver [7]) associated with the result E; is 


E(p;|H) dpi 
= 7 = 7 . 
Di Pi 


If the experiment consists in the measurement of a continuous vector or 
scalar variable with a differentiable distribution function, we define 


(1) Ai 


71/ 
(2) A = oe 12) 
p 
where p* is the random variable that is the probability density of the original 
random variable, and p is a realisation of p*. 
For practical purposes, (2) is almost the same definition as (1). For example, 
a continuous scalar variable is usually measured to some fixed number, n, of 
decimal places, and the natural classification of the possible results of the experi- 


ment is into intervals of length 10-" of values of the variable. If we then use 
definition (1) and let n tend to infinity, we get definition (2). For experiments 
with results that are real variables having distributions that are partly discrete 
(atomic) and partly continuous (differentiable), it is not immediately obvious 
what definition should be used. Something more will be said about this later. 

The surprise index is open to two criticisms: 

(I) It is changed when the results of an experiment are lumped together 
in a new way, in the discrete case, or when there is a change of mathematically 
independent variable in the continuous case. 

(II) The numerator in (1) or (2) is somewhat arbitrary. 

We shall now discuss these two criticisms. 

As an example of (I), suppose that an “unbiased coin” is spun twenty times.’ 
There is an obvious classification of the possible results of the experiment into 
2” categories. But, with this classification, 


(3) HTHTHTHTHTHTHTHTHTHT 


Received September 22, 1955. 

, By putting the description ‘“‘unbiased coin” in quotation marks, we intend to imply 
that a certain self-explanatory simple statistical hypothesis is to be taken for granted, and 
the probabilities of the possible results are the tautological ones usually associated with this 
idealised experiment. 
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has the same surprise index as 
(4) HTTHTHHHTTHHHAHTTTHTH. 


In practice, (3) would be more surprising than (4), at any rate if neither of them 
had been written down in advance of the experiment. This is partly because 
(3) is simpler. (The reader should avoid being confused by the two meanings 
of the word “simple.” We use the word in its technical sense only in the phrase 
“simple statistical hypothesis,” while in “simple hypothesis,” the word has its 
ordinary non-technical meaning.) 

If we imagine that the 2” possible results are classified into groups of roughly 
equal simplicity, (3) would belong to a small group, whereas (4) would belong 
to a large group. If we regard all the results in one group as a single possible 
result, it follows that (3) would, after all, have a higher surprise index than (4). 
Thus the vagueness of definition (1) is seen to arise from the difficulty of meas- 
uring simplicity. (With regard to the regrouping of results, see Bartlett [1], 
page 231.) 

The connection between surprise and simplicity can be defended by the 
following argument. 

Perhaps the main biological function of surprise is to jar us into reconsidering 
the validity of some hypothesis that we had previously accepted. Hence, we 
tend to be surprised when evidence is received against such a hypothesis, i.e., 
when the result of an observation has much greater probability when given 
some other, not entirely untenable, hypothesis. But in the process of being 
surprised, we often do not have time to estimate the initial probability of the 
rival hypothesis; instead, we tend to notice whether the rival hypothesis is very 
simple. More formally, we are surprised if E occurs when the likelihood ratio 
P(E | H’) / P(E | #) is large, where H was previously believed and H’ is very 
simple. 

Fortunately, simple hypotheses often have higher initial credibilities than 
complicated ones, so that the capacity of surprise leads to the discovery of new 
truths. 

In the above example, a hypothesis that would explain (3) would be that the 
coin always, or very often, rotates by the same (odd) number of half-revolu- 
tions. 

Since no one has yet thought of a satisfactory measure of simplicity, it seems 
unlikely that a really satisfactory measure of surprise can be given. For an 
experiment whose result is naturally expressed as a single integer, the difficulty 
does not seem to matter greatly. It is true that we may be temporarily surprised 
because the integer has striking properties, like those of 10,000 or 22,222, but 
we are often able to discount this sort of surprise as being due to a “mere co- 
incidence’”’ and as being dependent on the irrelevancy that we use radix 10. 

Obvious examples of experiments whose results are integers are those giving 
rise to binomial and Poisson distributions. For these, \; , has been evaluated 
by Redheffer [6]: for the binomial distribution, \; is expressible in terms of the 





1132 I. J. GOOD 


sums of the squares of the binomial terms (not coefficients) and therefore in 
terms of Legendre polynomials. Outside the range of existing tables, the Legendre 
polynomials that occur here may be cunveniently computed with the help of a 
formula given by Good [4]. 

We now consider criticism (II). A generalisation of the surprise index, with a 
more general numerator, has been briefly discussed by Good [3]. Let 


_ [E(p**)"" 
p 
ho = exp { E(log p*) — log p} = G.E.(p*)/p, 


Xo (u ~~ 0), 


(where G.E. means “geometric expectation’’), and let 

A, = log Ax (u = O). 
We may call A, a “logarithmic surprise index.’’ It can be seen at once that 
Au(u 2 O) is multiplicative, whereas A, is additive, if the results of several 
statistically independent experiments are combined into a single experiment. 
Weaver [7] did not allow his surprise index to be less than 1, but it is necessary 
to do so in order to achieve multiplicativity. A negative logarithmic surprise 
index corresponds to an event that “was only to be expected.”’ 

Of the continuous infinity of surprise indexes, the most natural ones seem to 
be A; and Xo, or, equivalently, A; and Ao. Bartlett [1] discussed Ao, but not in 
relation to Weaver’s suggestion. We shall argue below that A» (or Ao) is rather 
better than \, , at any rate for multivariate normal distributions. For univariate 
normal distributions, there is little difference between A» and A; . 

Before going on to this, we shall digress for a moment in order to discuss (i) 
distributions that are partly discrete (as promised earlier) and (ii) tail-area 
probabilities. 


2. Partly discrete distributions. The above reference to multiplicativity 
suggests a possible definition of A, for a univariate distribution that is partly 
discrete and partly continuous. We can first classify the possible results into 
‘‘atomic”’ on the one hand and ‘“‘non-atomic”’ on the other. This is a two-category 
(discrete) classification for which ge may be defined as \,, was before. Then, if 
the observed value of the random variable is atomic (or non-atomic), we can 
compute a conditional \“, i.e., conditional on the information that the variable 
is an atomic one (or a non-atomic one). Finally, we can define \,, = ASA”. 


3. Tail-area probabilities. The so-much-or-more method in statistics is the 
usual method in which the result of an experiment or observation is summarised 
by means of a tail-area probability 


P(e > z), P(2* = 2), or P(z* > x) + 3P(z* = 2), 


where <* is a real random variable and z is a real number. This method is most 
satisfying when 2* is the likelihood of an experiment (given a null hypothesis), 
but in this case the tail-area probability is often difficult to evaluate numerically. 
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Moreover, there are again logical difficulties for distributions that are partly 
discrete and partly continuous. 

The reciprocal of a tail-area probability is often not more than about 10 
times the Bayes factor against the null hypothesis, calculated in accordance 
with some reasonable assumptions about the initial distributions and proba- 
bilities (See Good [2], page 94.) When the ratio is greater than about 10, there 
is likely to be some argument about which is the better statistic. This difficulty 
can easily arise for bimodal distributions. 

Jeffreys [5], page 316, says “What the use of P (a tail-area probability) 
implies, therefore, is that a hypothesis that may be true may be rejected because 
it has not predicted observable results that have not occurred.’ In other words, 
a tail-area probability consists in the probability of an experimental result arti- 
ficially added to the probabilities of results that did not occur, or, if not artificially, 
at any rate with incomplete logical justification. 

\. , for any u, as a final summary of an experiment or observation, overcomes 
Jeffreys’ criticism of a tail-area probability, although it may still be unsatis- 
factory as compared with upper and lower bounds for a Bayes factor when 
we are prepared to assume enough about the non-null hypothesis. For a dis- 
tribution with density such as 


im (corn + e o-O2) 


V8 


\, is apt to be a much better summary of the experiment than a tail-area proba- 
bility would be. But it can be argued that better still would be the tail-area 
probability associated with the value of A. This would come to the same 
thing as the use of the distribution of the likelihood or the likelihood density. 
(The possibility of using Weaver’s surprise index, A, as a substitute for the use of 
tail-area probabilities was suggested in conversation by Mr. G. C. Wall.) 


’ 


4. \, for multivariate normal distributions. For multivariate normal distri- 
butions, P(p* < p), the distribution of the likelihood density, does not seem 
to be expressible in elementary terms. It is therefore perhaps more worth while 
to compute X,, for the multivariate normal distribution than for the Poisson and 
binomial distributions. 


A k-dimensional multivariate normal distribution has a density function of 
the form 


‘ i; 1,2,+++,k 
p= tigep (4 2 Aula — ale; — ad}, 


where |A| = det{A,;}. (See, for example, Wilks [8], p. 65.) 

Now, it is easily seen that for any k-dimensional probability density, the 
generalised surprise indexes \, , Ay(u 2 0) are invariant under all non-singular 
linear transformations.” This observation follows from the fact that the Jacobian 


? The method of this paragraph is due to the referee; my own method was clumsier. 
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of such a transformation is constant and non-zero. Therefore, for non-degenerate 
k-dimensional multivariate normal distributions, there is no real loss of generality 
in taking A = YJ (the identity matrix) and a, = a = --- = a = 0. For this 
standardised distribution, we can use the multiplicative property of \, for 
probabilistically independent experiments, together with a simple univariate 
integration, to evaluate \,, . Then, transforming back to the general non-singular 
distribution, we get 


1 
Au = Ca ite OP {3 2 Ai;(z; — ai)(z; — aj} (u > 0), 


A, = ‘z. Ai;(ai — a;)(x; — aj) — = log (u + 1) (u > 0), 


Ao = 3{ D> Ais(ai — a4) (x; — a4) — i}. 
t2 


It may be observed that A, (and therefore \,), regarded as a function of u, 
is continuous to the right at u = 0. By writing u = e” — 1, we see at once 
that A, is a strictly increasing function of u. When u— ~, A, tends to 


Aw = ‘Zz Ai;(ai — ax) (x; — aj). 
2 


From this expression it is clear that A, is the logarithm of the likelihood ratio 
in the sense of Wilks [8] for testing the hypothesis of our multivariate normal 
distribution “‘within’”’ the more general class of multivariate normal distribu- 
tions that have the same matrix {A,;;}, or, what comes to the same thing, the 
same covariance matrix. 

It is known (see, for example, Wilks [8], page 104) that 2A, has precisely a 
chi-squared (gamma-variate) distribution with k degrees of freedom. Since 


k 
A, = Aj — 5, 18 (u + 1), 


we can obtain the exact tail-area probability corresponding to any observed 
value of A, . But we may also develop an intuitive appreciation of A, (or \,) 
in itself, for some fixed value of wu. In order to decide which is the most natura] 
value of uw to take, we note that H(Ao) = 0. (This is obvious from the definition 
of Ao and also from the fact that 2A, + k has a chi-squared distribution with 
k degrees of freedom.) It seems natural to demand that the expected log-surprise 
should be zero before an experiment is performed. It is not equally natural 
to insist that E(A..) = 1, or that E(\~") = 1 (which gives u = 1), since \, and 
\~" have very skew distributions. For very skew distributions, expected values 
are more artificial than for ordinary distributions such as the chi-squared. For 
one thing, the median is a long way from the expected value for very skew 
distributions 

We conclude, then, that for the k-dimensional multivariate normal distribution, 
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Ao and Ao seem more natural measures of surprise than A; and ); , whereas other 
values of u do not seem to have anything special to commend them. There is 
little difference between Ag and A; when k is small. 

For k = 1, we have 


ho = e ? / ve, i =e? / v2, 


where s is the “‘sigma-age”’ of an observation; i.e., the deviation from the mean 
divided by the standard deviation. Some numerical values of X» and \; are 
given in the following table, together with the reciprocals of the corresponding 
two-tailed tail-area probabilities, P(s). 





5 





1/P (s) 3. 2% 16000 
j é 1800 160000 
2100 187000 





If we have a sample of several independent observations (k-dimensional 
vectors) from our multivariate normal distribution, we can compute Apo for the 
whole sample by multiplying together the separate Ao’s. This method may be 
regarded as an alternative to Hotelling’s generalised “Student” test. (See, for 
example, Wilks [8], Section 11.4, where further references are given.) 
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DISTRIBUTIONS OF ROOTS OF QUADRATIC EQUATIONS WITH 
RANDOM COEFFICIENTS! 


By JoHn W. HAMBLEN 


Oklahoma Agricultural and Mechanical College 


General Problem. The problem under consideration is, given the joint p.d.f. 
of the coefficients of an algebraic equation which can be expressed in poly- 
nomial form, to determine the joint p.d.f. of the roots and their marginal p.d_f.’s. 
Complete results are obtainable for the quadratic. 

1. Introduction. Consider an algebraic equation which can be written in 
polynomial form as 
(1.1) n” — in" + ben”? — --- + (-1)"&, = 0, 
where the coefficients, §;(i = 1, --- , n), are real or complex random variables 
with a given joint p.d.f. The roots of (1.1), ni(¢ = 1, --- ,n), are random variables 
which have a p.d.f. that depends upon the p.d.f. of the coefficients. To obtain 
the joint p.d.f. of the n; it is apparent that we must consider the two cases, 
when the coefficients are real and complex, separately. Furthermore, when the 
coefficients are real the roots may be either real or complex and hence require 
separate treatment. The case where the &; are complex random variables was 
considered in a note by M. A. Girshick [1]. When the £; are real, the 7; may be 
real or complex. For real ; the functional form of their p.d.f. is obtained by a 
change of variables in the p.d.f. of the &; by the use of the relationships 


(1.2) =D &= Dae m',& = 10, 


t=] <7 t=1 
with Jacobian, J, given by I<; (ni; — 1,;). For complex n; the treatment is 
similar, but a new set of relationships must be found to replace (1.2). In this 
case, we must be able to express the é; as functions of the real and imaginary 
parts of the 7; separately. 


2. Limitations. We can now see that there are two major problems involved 
in determining the p.d.f. of the roots of (1.1) explicitly. The functional form of 
the p.df. can be obtained without difficulty. However, we must be able to de- 
termine what regions of the coefficient space will give rise to real roots and what 
regions will give complex roots. Secondly, after having identified these regions 
we must be able to define their transforms into the root space. At present, com- 
plete results are obtainable only for the quadratic. 


3. Quadratic. For n = 2 we have 
(3.1) n—itnt+&=0, 


Received August 29, 1955; revised February 16, 1956. 
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sii 

where & and & are real random variables and hence may be any real-valued, 

Borel-measurable functions of real random variables. 


The roots, m and m, of (3.1) are random variables associated with £ and 
£ by the relationships 


Q « -~ ‘e &1 i 
(3.2) n= 9 4. V + _— fo, mn=s= aa 


f AT Ms 2=m™-* Mm 


m and » are either both real or are complex conjugates. From (3.2) we see 
at once that all points belonging to the “interior” of the parabola & = f [4 
will give complex roots, while the remainder of the (& , &)-plane, which consists 
of points on and ‘‘outside”’ of the parabola, will give real roots. 

We now consider the joint p.d.f., f(x, y), of & and & , where f(z, y) is of the 
continuous type. By truncating along the parabola & = e / 4, we obtain con- 
ditional p.d.f.’s relative to the hypotheses & > £ /4 and & S &°/ 4. If we let 
P(R) = P(&S fF / 4) and P(C) = P(& > &, / 4), then P(R) and P(C) are the 


probabilities of real and complex roots, respectively, and are given by 


P(R) = I f(x, y) dy dx and P(C) = If ii f(x, y) dy dz. 


J Jysz2/4 


The conditional or truncated p.d.f.’s [2] are 


2 
fle,y|C) = fe,y)/PO,u> Fs  fe.y|R) = f@,y)/P®),y $F. 

For & < £,/ 4, the roots of (3.1) are real and have a joint p.d.f. which is 
uniquely determined by the p.d.f. of the coefficients & and &. We will let 
g(v,, ve| R) denote the p.d.f. of the real roots. The functions (3.2) and (3.3) 
satisfy the sufficient conditions given by Cramér [2] for a change of variables 
in a continuous type density function. Therefore, we have 


g(v: , V2 | R) = flor + ve, vive) | J| / P(R) 


for all v; = ve, where || = (v, — ve). 
’ | 


Let gi(v; | R) and go(v, | R) be the marginal density functions of the real roots 
m and n , respectively. These are given by 


g(v|R) = | g(vi, v2 | R) dvs, and go(v2 | R) - | q(v1, t 


do R) dv. 


—o 


For & > / 4, the roots of (3.1) are complex conjugates. Let m = a + Bi, 
then 7. = a — Bi. a and 6 are defined by the functions 
‘ge 


< > be 
(3.4) 4’ 
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fi = 2a, t =a +8. 

a and 8 have a joint p.d.f. which is uniquely determined by the p.d.f. of 
£, and & , f(x, y | C). Let hi(X, Z | C) denote the p.d.f. of a and 8. The functions 
(3.4) satisfy the conditions stated by Cramér [2], so that we may find h,(X, Z | C) 
by a change of variables in f(x, y | C). Therefore, we have 


hy(X, Z| C) = f(2X, X* + 2’) |J| / P(C) 
for all X and all Z > 0, where |J| = 42. 
Similarly, if we let h(X, Z | C) be the joint p.d.f. of a and —8, we will have 
ho(X, Z| C) = f(2X, X* + Z’) |J"| / P(C) 
for all X and all Z < 0, where |J’| = —4Z. 

4. Examples. Numerous cases were considered [3] to the extent of expressing 
the marginal p.d.f.’s as integrals. For these cases and & were categorized 
according to type of interval over which their p.d.f. was greater than zero. 
There are twelve different interval types as follows: (— «, ~), (0, ©), (— », 0), 
(A, o), (— eo, —A), (—A, o), (— oo, A), (A, B), (—A, — B), (—A, B), (0, A), 
(—A,0), where A > 0, B > 0. The various combinations were considered using 
the normal, gamma, and rectangular density functions, respectively, and as- 


suming independence for § and & for convenience in obtaining their joint 
p.d.f.’s, Some dependent cases were also considered. 


4.1. Example. Bivariate Normal. Let f(z, y) be the general bivariate normal 
p.d.f., n(x, y; uw, we, 01, o2, p). Then 


(v1, v2|R) = ile ex [ mal (2 to—s) 
tied 2no102+/1 — p*P(R) xP | 21 — p) o1 


~% (2 +n be (ms = ) 4 (= _ “)}} 
C7} o2 o2 


—-o <n, —-o <Hy< om, 


Cs 22/4 
P(R) = | | M(x, Y; Hi, 2, 01, 02, p) dy dz. 


If we let wu = (x — mw) / o, and w = (y — ue) / o2 we have 


P(R) = fe Ze n(u, w; p) dw du, 


a'(u) = ic ae + a) = del. 
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On completing the square on w in the exponent, and substituting 


wW— up dw 


t= d= ———., 
V1 — 


fae 


we obtain 

P(R) = [ ‘i (2x) exp {—3(7 + w’)} dt du, 
where 6(u) = [6’(u) — pu] / ~/1 — p*. Finally, we may write 
(4.1.2) P(R) = | ou) ®0(w)I du, 


where 
a2) = [ eat 


is the cumulative normal probability function and ¢(t) is the standardized normal 
density function. We must employ numerical methods of integration to find 
the value of P(R) for a given n(z, y), i.e., for a given set of values for 4, ue, 
01,92, and p. 
For the complex roots we have 
h(X,Z|C) = 94 nex, x*+ 24, Z>0 
, P(C) ’ ? ? 

and 


h(X,Z|C) = b(X,-Z|C), Z2<0, 
where P(C) = 1 — P(R). 


The marginal distributions of the real roots have for density functions 


"1 
(4.1.3) g(vu. | R) = / g(v1, v2 | R) dve, —2 << @, 


(4.1.4) ge(v. | R) = i g(v1, v2 | R) du, —2 << © 
v2 


On substituting (4.1.1) in (4.1.3), expanding the terms in the exponent, and 
collecting terms w.r.t. powers of v2, we obtain 


| = "1 __@-w®) 
(4.1.5) papa I, 2roi02/1 — p*P(R) 


: exp{ - I [mi (v1) -v3 — 2my(v;) - me(v1) «v2 + ma(ool} dv2, 


where 


mi(n) = (1 /o2— p/n) +(l—p)/oi, 
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TABLE 1 
p= 0, +.2, +4, +.6, +.8, +.9 


| ua a 


| 0 
10 


3 
3 
3 


Nye eee 





TABLE 2 





wi=p2= 0,01. =63=1 


| P(R) 

5237 449 
5453 219 
5698 161 
.5872 947 
.5873 160 
5890 214 
.5873 160 
.5872 947 
5698 161 
5453 219 
.5237 449 





m,(v;)-mMe(v1) = pv / oo, — (1 / o} + pur / 0102 — we / o2)01 + (mi / oi — pio / 102), 
and 
m3(v;) = (0, / o1 — wi / 01 + pee / o2)° +(i- p dus / or. 
If we carry out the same procedure on (4.1.4) w.r.t. v; , we arrive at 
peo i acc: ‘ais 
go(v2 | R) — | (v, V2) 


~ 9 2/1 — pP(R 
(4.1.6) oh re 


1 ee 
exp | ~ =p) [mj (v2) -v1 — 2my(v2)-mea(v2)-v1 + mae) dv. 


Equations (4.1.2), (4.1.5), and (4.1.6) were evaluated for the various sets of the 
parameters shown in Table 1 using the ElectroData digital computer of the 
Statistical Laboratory, Purdue University [3]. Table 2 gives the values of P(R) 
for the case uw, = ws = 0, 0; = o = 1; a few representative graphs of the mar- 
ginal p.d.f.’s, gi(v. | R) are shown in Fig. 1 for the same case. The curves 
for go(v. | R) are mirror images of those for g; , the symmetry being due to the 
fact that gi(v: | R; uw, we, 01, 02, p) = gol —ve| R; —w1, we, 01, 02, —p) and 
v) = —Ve, since n(x, Y; m1, M2, 01, 92, p) = N(—2, Y; —M1, Me, 91, 02, —p). 
The tails of the g. curves are shown as dashed lines in Fig. 1. 
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@, tv, ]RD 


2 


3 
Fig 1'—o,(¥ 7") ond 9, (v,]R) for 4.1, 27H, +0, o,20, = 


Fic. 1 


4.2 Example. A Gamma Type. Let f(x, y) = exp {-—2 —y},7 20, y 2 0. 
Then 


a 22/4 
P(R) = I | exp {—z —y} dy dz = 1 — 2ev/all — &(4/2)] = 24, 
and 


R) = (vy — v2) 


Q(r1, 0» 24 


exp {—(v: + v2 + v0)}, 0OS%5%n,0Sun8 @; 
hy(X, Z| C) = exp {-@X+X*4+2Z)}, X20,Z>0; 


—4Z 


h(X,Z|C) = 76 


exp {—(2X + X*°4+ 2}, X20,Z <0. 
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G2(¥_|*) 


91 (¥,jR) 


2 3 4 
Fig.2~9,(v,|R) ond gp(ve[R) for 4.2 


Fig. 2 


Finally, 


%1 2 a 
gi(v; | R) | g(v1, v2 | R) dv, = a [ate exp {—»,} 


+ (1 + 0) exp (— (oF + 209} |, 0s 


and 


go(v2 R) / g(r, Ve | R) dv, = 


Jeo 


1 -" ° 
3g (1 + m2) exp {—(o2 + 20)}, 0 < nm < -~. 


The frequency curves, g;(”; | 2) and g2(ve| R), are plotted in Fig. 2. 
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NOTES 


A NOTE ON WEIGHTED RANDOMIZATION! 
By D. R. Cox? 


Princeton University 


Summary. It is shown that in simple statistical designs in which a covariance 
adjustment is made for concomitant variation, an unbiased between-treatment 
mean square can be produced by weighted randomization, i.e., by selecting an 
arrangement at random from a set of arrangements, giving different arrange- 
ments in the set unequal chances of selection. 


1. Introduction. Randomization is one of the key elements in the statistical 
aspects of experimental design [2]. It has as its object the conversion, under 
rather weak assumptions, of uncontrolled variation of whatever form into 
effectively random variation. It thus makes the conclusions drawn from the 
experiment more objective and avoids the introduction of strong, and quite 
often unrealistic, assumptions about the uncontrolled variation. The methods of 
randomization in practical use depend on selecting one arrangement from a set 
giving each arrangement in the set equal chance of selection. For example in a 
randomized block design, the set would usually be that of all randomized block 
designs obtained by permuting the treatments within the chosen grouping of units 
into blocks. An arrangement for use would be one such design selected at random 
out of the set. The purpose of the present note is to point out the theoretical 
advantage in certain cases of choosing from the set with unequal probabilities. 
No recommendation is made about what should be done in practice in such 
situations. 

The following assumption will be made throughout. We have N experimental 
units (plots, animals, etc.) and 7 alternative treatments to be compared, one 
treatment being applied to each unit. Suppose that there is a quantity z; as- 
sociated with the 7th unit and a constant a, associated with the uth treatment, 


such that if the uth treatment is applied to the 7th unit, the resulting observa- 
tion will be 


(1) 2+ 4,, 


independently of the particular allocation of treatments to the other units. 
The z; , a, are indeterminate to within a constant. The object of the experiment 
is considered to be the estimation, and possibly significance-testing, of linear 
contrasts among the a,. Assumption (1) can easily be generalized without 
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affecting the arguments that follow; for example a completely random term of 
constant mean and variance can be added to (1), but this will not be done here. 

The essential points of (1) are that we are measuring on a scale on which 
treatment and unit terms are additive, that the treatment effects are constant, 
and that there is no competition or interference between different units. 

The design is called unbiased [6] if it is possible to caleulate from the observa- 
tions the following: 

(i) unbiased estimates of the linear contrasts among the a, , for example of 
the differences, a, — a, ; 

(ii) unbiased estimates of the variances of the estimates in (i); 

(iii) a mean square between treatments, s; , and the mean square for residual, 
s; , such that the expectation of s; is greater than or equal to that of s? , with 
equality if and only if a; --- = a,. 

It is well known that most of the standard designs are unbiased under Assump- 
tion (1); the quantities in (i)—(iii) are calculated by the usual analysis of variance 
methods, and expectations are taken over the set of arrangements from which 
the one actually used has been selected at random. We shall assume for the 
purpose of this paper that one requirement for a design to he satisfactory is 
that it should be unbiased. 

Four examples of methods of design that are not unbiased in this sense under 
simple randomization are 

(a) a completely randomized, or other simple design, in which adjustments 
for a concomitant variable are made by analysis of covariance [5]; 

(b) the so-called semi-Latin square [6]; 

(c) a Latin square type cross-over design in which Assumption (1) is extended 
to allow for simple carry-over of treatment effects from one period to the next 
(5), (6); 

(d) certain designs in which for some practical reason there is a severe restric- 
tion on the treatment arrangements that are admissible. An example is [1]. 

This note is concerned with (a). 


2. Adjustment for a concomitant variable. Suppose that on each experimental 
unit, a concomitant variable is measured, giving a value 2; for the ith unit, 
and that 2, , --- , xy are fixed and independent of the allocation of treatments to 
units. Nothing is assumed in the randomization analysis about the relation 
between the 2’s and the 2’s. 

Consider to begin with a completely randomized experiment with n units 
for each treatment, N = nr. Denote the main observations, given by (1), by 
yi, °** , yw. These are random variables depending on the particular arrange- 
ment of treatments selected. Let >>, denote summation over those units re- 
ceiving the uth treatment. In the usual way define 


(2) jy. = Dy /n, z, = Det / n, 
g9.= Dy/N,: 2 = Las, 
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and set up the analysis of covariance table 
2 ry y? 


Between treatments B., Bun Bes 


(3) Residual Rus Ruy Ry 

Total T'ss T'sy Tv 
where, for example, 
(4) Ry = 2 a (x; = Ey) (ys = Gu). 

= o 
Let 6, = R., / R.zz and,define the adjusted treatment means 
(5) Du = 9, — b(z, = #.). 
Also define an estimate.af the variance of 9, — 9, by 
rie dha tlt. &— 2 
(6) V Gu Gr) 8, ? + — R.. ’ 
with 
(7) Av VG. — @) = 8842 + —* Bes 
a 


n n(r — 1) R.,)’ 
where 


2 1 2 

, = -—_———- (Ry, — zz 
(8) 8 la =e- (Ry — Rz/Rz:z) 
is the residual mean square of y adjusting for regression on zx. Finally the mean 
square for treatments adjusting for regression on z is 


gh. of T =) 
* de Gig(e-F-mtE 
The definition of these quantities is based on the least-squares theory of 
analysis of covariance; we now consider the randomization theory. 


3. A method for calculating expectations under randomization. To investigate 
randomization expectations an elegant and powerful method due to Grundy 
and Healy [3] will be used. Denote the expectation under simple (unweighted) 
randomization of any func‘ion, f, of the observations by Ep(f), 


(10) E,(f) = , 


ns 1K ; 
(no. of possible arrangements) al! arrangements 


Consider, as an example of the method, the calculation of Ep(R,,), when 
the design is completely randomized. It is easily seen from (1) that this expecta- 
tion is independent of a, , --- , a, and is a homogeneous completely symmetric 
function of 2, --- , zw of degree two, inva.iant under translation of the z’s. 
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Therefore 


(11) Ep(Ry) = oi > (2; - 2)’, 


identically in z , --- , 2 , Where ais a constant. If z,, --+ , zy isa random sample 
from a population of variance o’, the expectations of the left- and right-hand 
sides of (11) are respectively r(n — 1) and a(rn — 1), whence a = 
t(n — 1) / (rm — 1). Thus 


Ss r(n — 1) “ -\2 
(12) E-(Ry) = ———- 2, (ux: — 8). 
m—1 jon 

The choice of z; , --- , vy as a random sample in the last step of the argument 
has no physical significance and is purely a mathematical device to exploit 
knowledge of the behavior of R,, under the usual hypotheses of least-squares 
theory; see also [4]. 

In general the method is to establish the general form of the expectation by 
considerations of symmetry and invariance and then to find the precise expres- 
sion by special choice of the z’s, exploiting our knowledge of what happens 
under the conditions of least-squares theory. Thus suppose that we require to 
show that for a Latin square the expectations of the mean squares for treatments 
and residual are equal, when there are no treatment effects. Consideration of 
symmetry and invariance show that both expectations are multiples of the 
residual sum of squares of the 2’s, considered as a row X column arrangement. 
Equality of the two expectations under least-squares theory then proves this 
equality under general randomization theory. 

The result (12) is well known and can be obtained directly without difficulty. 
The point of Grundy and Healy’s method is that it avoids enumerative calcula- 
tions, and its advantage is consequently greater in the more complicated situa- 
tions, such as, for example, in the proofs that Latin squares, balanced incom- 
plete blocks, and so on are unbiased under (1). 


4. The application to covariance adjustments. If we try to calculate the 
randomization expectations of s; , 85 , defined in (8) and (9), there is the difficulty 
that R2, / R.z is a ratio of random variables so that no simple exact expression 
for the form of its expectation can be written down. When a = --- = a,, Tz, 
Te = Tx, Ty = Ta are constant and it follows from (8), (9), and (12) that 
Ep(s?) = Ep(ss) if and only if Ep(R2, / R22) is linearly related in a particular way 
to T,, and T, / T,z. Consideration of the form of Ep(R%, / R.zz) shows that no 
such relation can hold identically in the z’s and the z’s. Hence there is, in general, 
bias, although we always have 


(13) Er(G. — 9) = % — a. 


The bias arises from the factor 1/R,, in (8) and so it is natural to try to 
remove the bias by weighting each arrangement of treatments proportionally 
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to R,,. This is the general idea behind the following considerations. Suppose 
that the values 2, --- , ty are available to the experimenter prior to the allo- 
cation of treatments to units. Let w by any non-negative function of 7, --- , zy, 
defined for each arrangement of treatments within the set in which each treat- 
ment occurs n times. Let an arrangement be selected for use giving each design 
in the set a probability of selection proportional to w; we shall call this a process 
of weighted randomization using w as weight function. If f is any function of 


the observations y and z, its expectation under weighted randomization is 
Ew(f), where 


(14) Ew(f) = Er(uf)/Er(w). 


Let w = R,, and consider E »(s*). By (15) we need to know Ep(R,,Ry, — R2,). 
This is independent of a,, --- , a, and is homogeneous and of degree two in 
%1,°**, ty and in %, --- , 2w separately, and is unaffected by interchanging 
the x’s with the z’s. Hence 


E>(R.zRy —R2,) = Az? 
+ BT. + 2° 722) 
+ CT..T. + DT, 


+E - (x; — %)*(zs — 2)’ 


t=1 


+ Fz,2,T 


t=1 


G {z = (x; — €)(zi — 2.) + 2, > (x; — )*(%; — ad}, 


where A, --- , G are constants. The simplest way of verifying (15) from first 
principles is to note that 


Daze, DL (eiezest+ raze), DYo(rieer + rx), 
Lita ze; ’ Litre , Dre eri 


are the seven types of sum with the requisite degree of symmetry and that the 
right-hand side of (15) has seven arbitary constants. R,,R,, — Ri, is unaffected 
by changing z; to z; + a and z; to z; + b,i = 1, --- , N. Since this is true 
identically in the z’s and z2’s, A = B = F = G = O. Next, if 2; = Aqw,i = 
1, ---,N, ReeRy — Ri, is identically zero, so that 


N 
(16) 0 =» {crt + DT?, + ED (x — ay, 
t=—1 


whence E = 0. C = —D. If we combine this result with the expression for 
E,(R-,,) corresponding to (12), we have 


(17) E w(s?) = A(T. = Tf Pad, 
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where H is a constant. Finally let 2;, --- , ty be arbitrary but fixed and let 
the z; be uncorrelated random variables with means 6x; and constant variance 
o . If E denotes expectation over their distribution, 


E(T.s — Te: / Tos) = (nt — 2)e", 


E(s3) =o¢ 


by the ordinary theory of regression. Finally, since EE w(s?) = EwK(s:), (17) 
leads to H = 1/(nr — 2), so that 


2 
(18) Ey(?) _ tok (7. = =) = Pes 


(nr — 2) , 


, 


say. 
Similarly if a; = --- 


(19) Ew(ss) = a7, 


the unbiased property. When the a’s are not all equal, a multiple of their cor- 
rected sum of squares is added to (19); details will not be given. 

We now investigate the corresponding theory for the variance and estimated 
variance of the difference between two adjusted treatment means, 9, — 4%, 
say. It does not seem possible to obtain exact results corresponding to those for 
s, and s; and we shall need to use the following asymptotic results. If, as N 
tends to infinity, f and g are random variables, functions of the y’s and the 2’s 
with fixed means and with variance of order 1/N, then 


(20) E(fg) = E(f)E(g) + O(/N), 
and, under weak conditions on g, 
(21) E(f/g) = E(f) / E(g) + O(1/N). 


These will be used with F standing for Ep or Ew as convenient. The expectation 
on the left of (21) is to be taken as referring to the asymptotic distribution 
of f/q. 

Now we have from (7), (18), and (20) that 


9.2 2 
(22) EwlAv V.(j. — %)] = — + oy we Cale *< (x)) 


= as 2Qa° { 1 1 \ 


If 7 is fixed and n tends to infinity, the relative error in (20) is of order 1/N’. 
In obtaining (22) we have assumed that the 2’s and the z’s are such as to make 
the variances of s; and B,, / Rzz of order 1/N., 

Similarly to find the actual variance of 9, — 9, we have that 


Av {Gp a de) = (a, a a,)}" Av i} (2,2 — 2,2) — = (2,2 soe ,2)\? 
¥ ay (1 n 


J 


_ 2 IRB , Rew a) 
= 5 (2. =” a 


4 


(24) 
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If we apply the operator Ew to (24), we get the required variance. The expecta- 
tion can be evaluated in a way similar to (23), dealing with the last term by 
(21). The final answer is the right-hand side of (23). That is, to the order indi- 
cated above, (7) is an unbiased estimate of the average variance of 9, — 9, . 


5. Extensions. The calculations in Section 4 have, for simplicity, been made 
for the completely randomized design. However the results can be extended to 
designs such as randomized blocks and Latin squares; weighting proportional 
to the residual sum of squares of z again gives an unbiased treatment mean 
square. Another generalization is to multiple analysis of covariance, in which 
the treatment means are adjusted for k concomitant variables 2, --- , 2%. 
The appropriate weighting function is then the residual generalized variance, 
i.e., the determinant |R;;|, where R;; is the residual sum of products of z; and z; . 


6. Discussion. The idea of weighted randomization discussed above is probably 
solely of theoretical interest, at any rate in the context considered here. A full 
discussion of possible practical applications would require further work, but the 
following points are worth making. 

(i) The bias in unweighted randomization is probably small, except possibly 
when N is very small and the correlation between the z’s and the z’s very non- 
linear. Further work is needed, however, to find the likely magnitude of the 
bias in typical cases. 

(ii) Weighted randomization is perhaps most likely to be of practical value 
when a series of similar experiments are planned, each with a small value of N. 
Another possible application is to Latin square designs in which it is desired to 
control variations diagonally across the square, in addition to row and column 
variation. This can be done by inserting a suitable concomitant variable, for 
example the product of row number and column number suitably coded. Weighted 
randomization would justify such a method in the same way that ordinary 
randomization justifies the conventional use of the Latin square. 

(iii) Arrangements with a large value of R,, will have a small value for B., 
and conversely. Hence the weighting proportional to R,, attaches greater 
chance of selection to those arrangements in which the treatment groups are 
balanced with respect to the mean value of z. 

(iv) If weighted randomization is to be done in practice with N not very 
small, some short-cut method is needed for selecting an arrangement, since 
the enumeration of all arrangements and the calculation of R,, for each would 
usually be too tedious. Professor J. W. Tukey has pointed out that weighted 
randomization can be done reasonably simply as follows. Let M be the maximum 
over-all arrangement of R,, . Select an arrangement by unweighted randomiza- 
tion and calculate R,, for it. Reject the arrangement with probability 
1 — R,, / M. Continue until an arrangement is accepted. 

(v) Weighted randomization is, of course, restricted to cases in which the 
concomitant variable is available prior to the allocation of treatments to units. 





STOCHASTIC APPROXIMATION METHODS 


REFERENCES 


{1] D. R. Cox, “The design of an experiment in which certain treatment arrangements 
are inadmissible,” Biometrika, Vol. 41 (1954), pp. 287-295. 

[2] R. A. Fisuer, The Design of Experiments, 5th ed., Oliver and Boyd, Ltd., 1949. 

[3] P. M. Grunpy anp M. J. R. Heaty, “Restricted randomization and wuasi-Latin 
squares,” J. Roy. Stat. Soc. (B), Vol. 12 (1950), pp. 286-291. 

[4] J. O Irwin anv M. G. Kenpatt, “Sampling moments of moments for a finite popula- 
tion,” Ann. Eugenics Vol. 12 (1943-45), pp. 138-142. 

[5] O. Kempruorne, The Design and Analysis of Experiments, J. Wiley and Sons, 1952. 

[6] F. Yares, “Bases logiques de la planification des expériences, Ann. Inst. H. Poincaré 
Vol. 12 (1951), pp. 97-112. 


a 


ON STOCHASTIC APPROXIMATION METHODS! 
By J. Wo.LFrowI!tTz 
Cornell University 


In [1] 4. Dvoretzky proved the theorem quoted below, which implies all 
previous results on the convergence to a limit of stochastic approximation 
methods. (For a description of these results see [1].) In the present note we give 
a simple and, we think, perspicuous proof of this theorem which may be of help 
in further work. The present note is entirely self-contained and may be read 
withot -eference to [1]. 

Tarorem. (Dvoretzky) Let an, Bn and y,(n = 1, 2, --+) be non-negative real 
numbers satisfying 
(1) lim a, = 0, 


nwo 


DX Bn < @, 


n=l 


(3) Dd Yn = &, 


n=l 


Let 9 be a real number and T,(n = 1, 2,---) be measurable transformations 
satisfying 

(4) \T a(r: eos eS Tn) = 6 Ss max[a, , (1 + Bn)|?n od 6| ‘ani Yn} 

for all real r,, --+ , tn. Let X; and Y,(n = 1,2, ---) be random variables and 
define* 


Received December 16, 1955. 

1 This research was supported by the United States Air Force under Contract No. AF18- 
(§00)-685 monitored by the Office of Scientific Research. 

2In the proof of the theorem we will, for the sake of brevity, write T,(X,) for 


Tn(X1 TT - Xn), 


just as is done in [1]. No ambiguity will be caused by this. 
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(5) Xnsi(w) = T.(Xi(w), «++ , Xa(w)) + Yal 


forn 2 1. 
i oa sal 
Then the conditions E\ Xi} < =, 


(6) > BtY3) < @ 


and 
(7) E{Y,|X1,°--- 
with probability 1 for all n, imply 
(8) lim E{(X, — 6)"} = 0 
and 
(9) P{lim X, = 0} = 1. 
ExtTEnsIon. The theorem remains valid if a, and 8, in (4) are replaced by non- 
negative functions an(m1,-°*,Tn) and B,(r1,-°-,1n) respectively, provided: 
The functions a,(71,°**,%n) are uniformly bounded and 


(10) lim a@n(ri, +++, Tn) = 0 


n=O 


uniformly for all sequences r;,--: , Tn, °**; the functions B,(ri,--- , Tn) are 
measurable and 


(11) Do Balti, +++ 5 Tn) 

n=1 
is uniformly bounded and uniformly convergent for all sequences 71, °*- ,Tn,°°* 3 
and for any L > O there exist non-negative functions y,(rT1, °°: , Tn) satisfying 


(4), and 
(12) DL alti, +++) te) = 2 
n=1 


holds uniformly for all sequences 71, +++, 1%, °° for which 


(13) sup |ra| < L. 
n=l1,2,+++ 

Proor: Without loss of generality we may take @ = 0. 

I. From (4) and (6) it follows readily that EX3, < © for any n. 

II. Define s(n) to be the sign of [7',(X,)][X,] if neither factor is zero, and 
s(n) = 1 if either factor is zero. Define x(m, n) = Ii}. s(j), Yn = w(1, n)Y, 
The series > r. converges w.p.1, by Loéve ({2], p. 387, D) and (6) and (7). 
Let 

Z(m,n) = DY; 


j=m 
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For any 6 and ¢e both >0, there exists M’(6, €) such that 
( 
(14) P4 sup |Z(m,n)| > 3\ < =e 
ie , 
III. Let d(m, m — 1) = 1 and, for n 2 m, 
d(m, n) = II (1 + §;). 
j=m 
Consider the sum 


n+1 


S(m, n) = 20 d(j, n)¥jn, 


which is equal to 
n—1 
as Do Z((m — 2), (7 — 1))ld(j, n) — a(j + 1, n)) 
5 jam 
— Ym—2 d(m, n) + Z((m — 2), (n — 1)) d(n,n) + Yi" 
Since d(j, n) = d(j + 1, n) we have that the absolute value of (15) is not 
greater than 
2[ sup |Z((m — 2), (j — 1)) |] d(m, n)) + | Ya. 
I 
m—lsjgn 


Hence, from (11) and (14) it follows that, for 6 and « both >0, there exists an 
M" (6, «) = M’'(6, €) such that d(m, ~) < } for m = M” and 


( 
(16) ry sup | Z(m, n) | <_ sup | S(m,n)| < i> 1 - 5 


M/’<msn M’’<msn 


Proof of (9) under the conditions of the extension. Let « and 6 be positive ane 
arbitrary. It is sufficient to prove that 


(17) P{|X,| < 6 for all n sufficiently large} > 1 — «. 
Let M = M”(é, e) be so large that, for n 2 M, a, < 6/8. Let L be so large 
that L > 6 and 


2 eL? 
(18) am SS < 39M ° 


We take this to be the L for which (12) holds. It also follows that 


(19) ae Xi 32h >1- $. 
isis 4 2 


Suppose that the following four conditions are fulfilled: 


(20) The relations in curly brackets in (16); 
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|Xn| forsome m2 M; 


| X mtj \ > 


6 
f ’ 


lsjsk; 


. 6 
| Xmpesi| Sz. 


+ 


Herel S$ k S ~.Incasek = ~, (22) is to hold for all 7 = 1 and (23) is vacu- 
ous. (It will be clear by the time the proof is finished that k cannot = ~.) 
Because a, < 5/8 forn = M and because of (20), (21), and (22) it follows that 


(24) |T'm+j(Xm+j)| > mss» 0sjsk-—1l, 

(25) sign Xmij41 = sign T'm+j;(Xm+j), Osjsk-—-1. 
Applying (4) (with the y’s zero) we obtain that X,,,: lies between zero and 
(26) s(m)(1 + Bm)Xm + Ym. 


Repeating this argument, we obtain that, for 1 S 7 S k, Xm+; lies between 0 
and 


s(m + 7 — 1)s(m +7 — 2) --- s(m) d(mym+ 7 — 1)Xm 
(27) +s(m+j—1)--: s(m+1)d(m+1,m+j—-1)Ynt+::: 
+ sim+j—- 1l)d(m+ 7 —1,m4+ 9 — 1)¥ mijn + Vmaj. 
The absolute value of (27) is not greater than 
(28) \X m| d(m, m + 7 — 1) + |S(m + 1, m+ 7 — 1). 
Hence 
(29) \Xm+i| < 8, lsjsk. 


To prove (17) it remains only to show that the following conditions cannot 
both hold: 


(30) the relations in curly brackets in (16) and (19); 
(31) ixal > : forall n= M. 


Applying the argument of the previous paragraph with 6 replaced by L we obtain 
that 


(32) \X,| < ZL for alln 2 1. 
Hence (12) holds. In view of (30) and (31) it follows that 
(33) |7(Xn)| > an for alln = M — 1, 


(34) sign T,(X,) = sign Xn4, foralln => M — 1. 
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We may now, and do, apply the argument which led to (28), but with the y’s 
which satisfy (12). We conclude that, for all n’> M, the absolute value of 
\X,| is not greater than 
n—1 
(35) |Xu|d(M,n—1)+|S(M+1,n-—1)|- Dv 
j=M 
For n sufficiently large this becomes negative, contradicting (33) and hence (31). 
This completes the proof of (9). 
The fact that EXj < © is used in the above proof only in order that EX?, < 
«© for all n, and this latter fact is needed only for (8), and not for (9). For in the 
proof above we used the fact that EX% < © only to obtain explicitly an L 
for which (19) holds. Such an L obviously exists whether or not EX, <. 
Proof of (8) under the conditions of the extension. Let K = maxi< jew aj. 
Let N be an integer to be chosen later. In view of (9) we have only to prove that 
lim... E{(|X,| — K)*}* = 0. Let P denote probability measure and A be 
any set in the sample space which can be defined in terms of X,,--- , Xm. 
We use the inequality 


H(A) = f ((\Xmas| — KY‘ dP = ff ((Ta(Xw) + Yu | — K)*) dP 


(36) < i [¥2 + ((| Tm(Xm) | — K)*)] dP 


< i [Y2, + KBn(1 + KBm) + (1 + Bm)*(1 + KBm)((| Xm | — K)*)*] dP 


which is in [1] and can be deduced from (4) and (7). Let B(j) be the set {|Xw4,| S 
K, |Xvsi) > K for 0 Si <j}, D(j) the complement of 
B(O) + BY) + --- + B(j). 

Iterate the inequality (36) to obtain an upper bound on H,(A), n > N, begin- 
ning the iteration at m = N, N + 1, --- ,m — 1, respectively, and using as A 
the sets B(0), B(1), --- , B(n — N — 1), respectively. In each case the last 
term of the integrand of the right member of (36) vanishes. Adding, we obtain 
that H,(B(O) + --- + B(n — N)) can be made arbitrarily small by making NV 
sufficiently large. 

It remains only to consider H,(D(n — N)). For any point in D(n — N) we 
have, as in (27), that 00 


(37) |\X,| S |x(1, N — 1) d(N,n — 1)Xy + S(N + 1,2 — 1) 
Hence, by Minkowski’s inequality 


(J, _,X a) 


<a, <0(f  cw' ar) + tar, = (Sev3) 


j=N 


(38) 
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The second term on the right of (38) can be made arbitrarily small by making N 
sufficiently large. The first term can be made arbitrarily small by making n suffi- 
ciently large, since P{D(n — N)} +0 as n— ~. This completes the proof of 
(8). 
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ON THE DERIVATIVES OF A CHARACTERISTIC FUNCTION 
AT THE ORIGIN 


By E. J. G. Pitman 
University of Tasmania 

1. Introduction. Let F(x), —~ < 4 < ~, be a distribution function, and 
$(t) =-[ e'* dF(zx) 


its characteristic function, defined and continuous for all real ¢t. Let k be a positive 
integer. If the kth moment of F(z), 


Kk = i] a dF(zx), 


exists and is finite (integral absolutely convergent), ¢(t) has a finite kth deriva- 
tive for all real ¢ given by 


¢(t) = */ aze'* dF(z). 
In particular, 


¢"”(0) _ tue E 


The existence and finiteness of y, is a sufficient condition for the existence and 
finiteness of ¢“’(0). It can be shown (see [1]) that when k is even, this condition 
is also necessary; but when k is odd this is not so. Zygmund [2] has given a 
necessary and sufficient condition for the existence of ¢’(0) and also one for the 
existence of a symmetric derivative of higher odd order at t = 0; but he imposes a 
certain condition (smoothness) on the characteristic function. In the following 
theorem the conditions are on the distribution function only. 


Received October 5, 1955 
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2. Statement of Results. 


Tueorem. Let k be an odd positive integer. Necessary and sufficient conditions 
for the existence of ¢”(0) are: 


(i) lim z*{F(—z) + 1 — F(z)} = 0, 


T 
(ii) lim | x‘ dF(z) exists. 
T+2 T 
When these two conditions are satisfied, 
T 
(0) = Pim [2 a). 
T+-2 T 
If X is a random variable with distribution function F(x), so that 
F(z) = P{X s 2}, 
condition (i) may be stated in the form 
lim 2‘[P{X < —z} + P{X > z}] = 0. 
A condition which is easily proved equivalent is 
lim z*{P|X| =z} = 0. 


3. Two lemmas. 
Lemma 1. If G(x) is defined and non-decreasing for x = 0, and if k > 0, the 


four statements below are equivalent, i.e., any one implies the other three. 


(1) lima rf dG(z) = 0; 


: a** dG(z) 
T-2 r = 
(3) | lim T [ 2’ dG(z) = 0; 


lim 0; 


(4) tim T [2 sin’ (2/T) dG(e) = 0. 
Suppose (1) is true. Put 
H(2) = [ dGlz) = G(0) — G2). 
Then 7*H(T) — 0 when T > , and 


[ : at dG(z)  — [ : a** dH(z) 


(+1) [ AHCe) dz 
2 2 PR) + ——2#g —— 


T ’ 
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both terms of which — 0 as T’ — if T*H(T) — 0, and so (2) is true. Now 


-2T 2T 2T 
aan) [at aga) = 7 | 2 dG(z) > T* i dG(z). 
d T 7 


0 


When (2) is true, the first term in the inequality — 0 as T — ~, and therefore so 
does the last, i.e., 


(5) W(T) = T*{G(2T) — G(T)} ~O0asT > o. 
- I dG(z) = T* Do (G(2"7) — G2""*T)} = Dae *wee"""7). 
T = n=1 


n=l 


Because of (5), W(7') is bounded for T' = 0, and therefore this series is uniformly 


convergent with respect to T = 0. When 7’ — ~, each term — 0, and therefore 
(1) is true. Thus (2) implies (1). 
Suppose again that (1) is true. Put 


A(T) = sup [x*H(x); x = T). 
Then A(T) +0 as T > ~, and 


T I 2* dG(xz) = —T a* dH(z) 
T “T 
T*H(T) + (k —1)T | z**H(x) dx 
T 


S THT) +|k—1] ra(r) | cde 
7 
= TH(T) +|k —1]| A(T), 


which — 0 as T — «. Thus (1) implies (3). 
The converse of this is not actually used in this paper; but there is some in- 
terest in stating and proving it so as to round out the lemma. If k 2 1, 


rf s*da@2r] aac), 
T - 
and so (3) implies (1) in this case. We now suppose 0 < k < 1. Now 
2T 2T 
Tf 2 ag) = aT" [ dae). 
7 T 


If (3) is true, the first term in the inequality — 0 as T — , and therefore so 
does the second. (5) is then true, and this, as shown above, implies (1). Next, 


oo T oo 
T [ a’ sin’ (x/T) dG(x) = T [ +T / =h+h; 
“0 “0 7 


~T : 7\ \,2 
h=T*] 2 (=a oi?) dG(z); 


2/T 


. 7 T 
sint1-T [| 2" dG) sts Tt [ 2 dQ). 


“0 
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Hence J; — 0 as T — @ if and only if (2) is true. Thus (4) implies (2) which 
implies (1). Also 


Ins r | a dG(z), 
, 


and so (0) as 7 — if (3) is true. Thus (1), which implies (2) and (3), implies 
(4). 
Lemma 2. When the statements 1-4 of Lemma | are true, 


2 ~T 
7 [ x’ sin (x/T) dG(x) — a dG(r) ~0as T > ~. 
J0 “0 
This function of T is equal to 


2 k-1 ; ire k sin (x/ 7) 
y / x sin (x/T) dG(z) — x ( 1 — = (7) dG(z), 
JT “0 a/T 
which has a modulus not greater than 


T 


2 e ac aT 
r| a dG(x) + a‘ -2°/6T’-dG(x) < T a* dG(z) + 47" | a** dG(z). 
T “0 ~0 


JT 
This — 0 as T — @ because of (3) and (2). 
4. Proof of theorem. If ¢o(t), ¢:(¢) are the real and imaginary parts of ¢(¢), 


b(t) = dot) + igi (8), 
do(t) = | cos tz dF (zx), gilt) = | sin tz dF(z). 


¢o(t) is an even function of t, and ¢;(t) is an odd function of t. A derivative of 
¢o(t) of odd order which exists at t = 0 must be zero there, and the same is true 
of an even derivative of ¢,(t). 

Let k be an odd positive integer, and suppose that ¢“’ (0) exists. It follows from 
the last paragraph that 


(0) = ig, (0), 


and so has real part zero. ¢ (0) must exist and be finite. As k — 1 is even, this 
means that yx_1 is finite [1]. Therefore ¢”~” (t) exists and is finite for all real t, and 


ory = tt [ate ar), 
(k—1) (kD (gq ie Ne 
$0) <i 0) _ at [ pins | OF (2) 


=e 


© sean 1 
= —-7 / x = dF (zx) 
i 1 


+7 [ a” ~ - dF(z). 
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Put G(x) = 1 — F(—z). This is a non-decreasing function of z. We may write 


¢*(t) —¢ k=) (9) - _ [" sin? (3tz) 
0 


t ht 


ek - k—1 sin t as ” k | 
+1 | x ; dF (z) l x dF(zx) 


oo ‘ 1/t 1/t 
-# | [eo @2a@m-f[ 2 Agta) | +e 2 dF(). 
0 t J0 —l/t 


d{ F(x) + G(z)} 


Because ¢“’(0) is purely imaginary, when t > 0 the coefficient of ‘** must > 0. 
Hence F(z) and G(z) both satisfy (4) of Lemma 1 (with T = 2/t). Therefore 
they satisfy (1), i.e., 


T"{F(©) + G(o) — F(T) — G(T)} ~0asT—> ~, 
T*{1 — F(T) + F(—T)} ~OasT— ~, 
which is equivalent to condition (i). 


By Lemma 2 (with 7 = 1/t), the second and third terms on the right-hand 
side of (7) both — 0 as t — 0, and therefore 


1/t 
‘* lim z* dF(z) = ¢’(0). 
t+0 /—1/t 
Condition (ii) is thus necessary. 
To prove that conditions (i) and (ii) are sufficient, suppose them satisfied. 
F(z) and G(x) satisfy (1) of Lemma 1 and therefore (3) also. Hence 


I 2’ dF(x) and | az’ dG(z) 
0 


are both finite, and 
ma = | 2 at) + GG) 
is finite. (6) is then true, and therefore (7). When ¢ — 0, the first and second 


terms on the right-hand side of (7) both — 0, and the third term — a limit. 
Thus ¢“(0) exists, and 


T 
$0) = * lim [ z* dF(z). 
T-2o T 
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N-DIMENSIONAL DISTRIBUTIONS CONTAINING A NORMAL 
COMPONENT! 


By CHARLES STANDISH 


Cornell University 


In this paper we obtain necessary and sufficient conditions for an n-dimen- 
sional distribution function F(z;, --- , Zn) to contain as a factor the distribu- 
tion function of n independent normal random variables having common mean 
zero and variance 1. That is we obtain conditions for F(x, --- ,z,) to be of 
the form 


(1) F(m,-++,2n) = [one [Ge = ty +5 0, — te) dP Cen = 5 te) 


where P(u , --- , Un) is a distribution function and 
1 n pr Zn 
Gai, 520) = (Se) Po oe [exp lad + oe + da 2 dan, 


If we denote 0"/d2, --- 0x, F(a, +--+ ,2n) by f(a, «++ , Xn), the problem be- 
comes that of representing f(z; , --- , 2.) in the form 


f(t, *** , Zn) -(%) £ 


[exp {lar = a)? + ++ Gn = a) dP Cu ++ 1). 


(2) 


The one-dimensional case has been treated by Pollard [1] employing properties 
of the heat equation. We use a different approach to prove the following 

THEOREM. f(z; , --- , Zn) is representable in the form (2) with P(w, +--+ , Un) a 
distribution function if and only if 


(i) [ive [sles ++ 20) dy 2+ dag = 1 


(ii) f(z; , --* , tn) ts bounded and has mized partial derivatives of all orders 
satisfying 
gf .-- okn wake t+ +h 
Fagg Sty +++ ye) | SAMBA VT Te, 
ky, a » Ka = 1,2,--- ° 
2 CJ (—y) tt tenet sada en a" lt 


kn 
eo 7... 2 A ast ae 
Oz, 


k}=0 knwo Akt +PR bl eee Kyl az*! baa 


lal<1,-++,J/a|]<1. 


Received September 28, 1955. 
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Proor. We carry out the proof for n = 2, the proof for n > 2 proceeding in 
exactly the same fashion. The necessity of (i) is obvious. As for (ii) we have 


ak, aks 


ame f(a, 9 2X2) < 2 | | | Hy, (a1 — us) Hy... (a2 — Ue) 
Ox} Oak?" T Jo oo ° 
- exp {—[(a1 — wm)” + (22 — u)*]} dP(us, us) |, 


where H,(x) is the kth Hermite polynomial which satisfies 
2) 2 x 

(3) | H.(x) | < A2*?./k! exp > 
({2], p. 236). Hence the integral above is majorized by 


25 ky + ke aa - — a » 0 . : ‘ . . 
A 2— 5 V kil ke! | | exp (—$3[(a1 — mw) + (at. — w)"}} dP(w, w), 


° ° - ky + ke 
which is $A" 2 —,— 


~ 


Vk; !ke!. To establish the necessity of (iii) we observe 


that we have formally 


2 ca (—1)*t regia gh? a a2ke 


yi : a, F(a , X2) 


ki=0 kgmo0 4” thy ke! x21 O22 


2 SG (=1)" et 
" l | os i 0 “Gebel Hx (a1 — us)Hm,(%2 — 2) 
—0o J—oo kim = vy. Ke: 


x exp {—[(a, — mw)” + (x2 — w)"}} dP(ur, us). 


From (3) it is seen that the double series in the integrand converges if all terms 
are replaced by their absolute values provided |t;| < 1, || < 1, and the integral 
may be written as 


p= re PS (= 1)" tr TP & (-—1)"4! 
(4) | | |= Sat Hx (x1 _ w)| PS THE Ax, (x2 — U2) 
oo — 0 k =0 7 vl]. a= “has 


x exp {—[(a, — wm)? + (22 — w)*]} dP(u, w), 
but 
> (=I) ek iano ] F zt) 
e% 4" },,! 2ki\st )= V/1 ee t, exp ( om 1 2” t, / 


({1], p. 580), and (4) becomes 


(5) [ , Treni Bo - {{‘ 


which for fixed ¢, and & is S constant J*..f*. \dP(u , w)|. This justifies the 
formal manipulations above and (5) is clearly non-negative establishing the 
necessity of (iii). For the sufficiency we need a couple of lemmas. 


} dP(w, U2), 


/ 
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Lemma 1. Denoting the left-hand side of (iii) by T:,,..f(a: , 22) we have for 
functions f(z; , x2) satisfying (ii) 


lim | | exp {—[(z, — m)* + (a2 — U2) V} Tt ,,t.f(ur, U2) du du, = f(xy, 22). 


t +1 
tel 
Proor. The estimates furnished by (ii) enable us to write 


a2ke 


2k 
E [ exp { — [(z; — m)° + (x2 — w)*}} a 


Ouz*: Ar i2k2° 


f(u,, Ue) du dus 


a 2k 
1 oO 2 


a I exp [— (x2 — us)" due [ exp [—(a, — w)’] sero U2) du, 


and upon integrating the inner integral 2k, times by parts we have 


2 co g™ P o"*? 
[. exp [—(z2 — 12)’ dus | aah exp [—(a; — m)’] aunt U2) duy 


« 2k © 2k 
1 ° 2 0 2 5 
- . aur exp [—(a, — m)] dm . exp [—(ze — ue)’ juge I» Us) dur. 


We integrate 2k, more times by parts and obtain finally 


a) 2 a” f 
- exp [—(a, — 1)’ he ; exp [—(xe — we)"| f(ur, w2) du due. 


20 Ou; ue 


Thus 


lim / | exp {—[(a, — um)? + (x2 — w)I}Te,.1.f (tr, Ue) day dus 
a 
(—1)**47! cr. 
' -_ [. [. | >, 4*i ky! Hx (x a us) [ > 4*: ke! Hr, (22 2 ua) 
tel 


x exp {—[(a. — m)” + (a — us)'}} f(u, Ue) du, duy. 
By (4) and (5) this becomes 


(tim > eile fF exp — (a — wu)’ du ) 
a Wi-~tine i-t, : 
(t%2 — UW)? 


x (lim aa L exp | — -o | fou, U2) du) = f(x, 22). 


LEMMA 2. 


=, T t,.t2f(u, Us) du, diy == SS | ty | < i, lt | < : 
T Jw 4-2 
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PRoor. 


er Ts 
- [ [ T't,t2f (ur, U2) du due 


= : [ [ dx, dx [ r exp — (a — ™)” + (a2 - Ua) IT t,,09f (ua, U2) du; dug 


1 oO peo 2 po 1 if (xy in u)* (x _ U2)? 
=F f [oanae | eet ink * 48 } 
x flu, Ue) du, duz 


-; [ ic wa) de din ‘ [. Ste 


2 2 
x exp {—| = wy + a da, dz, = 1. 
7 ae 


By the above lemma and (iii) the family of functions 


zi ze 
Prats a) = [ [ T t,,t2f(ui, U2) du duz 


is monotone in the sense of Bochner ((3], p. 383) and uniformly bounded; hence 
there exist sequences {tin} {4} such that t, > 1, 4, > 1 and a function P(z; , 22) 
monotone and bounded such that 


lim Prre.ten(%a » Ze) = P(x, , 2a) 
({3], p. 389-390). By Lemma 1, 
f(a, 2%) = lim EE exp { — [(z, — m)* + (22 — 1%)"}} OP tn ,tan(U » Ua). 
By the formula for integration by parts in two dimensions [4] the above integral 


becomes 
2 


: oO 2 a ; 
lim if Prin.ton (Ua » Ue) judu, {—[(e. — w)* + (a2 — w)']} dus due, 
and integrating by parts again 
1 eo 2 
fla,2) =* | [exp {=U — wi)? + (er — wl} aPC, ws). 


To complete the proof that P(u, , w) is a distribution function we observe that 
by condition (1) 


l= [ [se » %2) da, dx. 


: Li dP (uu v2) Li exp {—[(tm — m)* + (22 — u2)"]} dai daz 


EE dP(uy , t42). 
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A CERTAIN CLASS OF TESTS OF FIT 
By Lionet WeEIss 
University of Oregon 


1. Summary and introduction. Suppose X,, X2,--- , X, are known to be 
independently and identically distributed, each with the density function 
f(x), with fof(x) dx = 1. Let Y; S Y2 S --- S Yn be the ordered values of 
X,, X2,°+: ,Xn, and define W, = Y1, W2 = ¥Y2—-¥1,---, We = Ya — 
Y,-1, and Way, = 1 — Y,, so that Wi + --- + Way = 1. Finally, define 
Zi, °** » Zn41 a8 the ordered values of Wi, ---, Wnyi1,sothat0 SZ7,527,5 

- S Zana, with Z; + --- + Zaas = 1. We are going to test the hypothesis 
that f(z) = 1 for 0 < x < 1, and we are going to consider only tests based on 
Z,,Z2, +--+ , Zn. The intutitive justification for this is that, roughly speaking, 
deviations from the hypothesis on any part of the unit interval are treated alike. 
Several authors have discussed tests based on Z,, --- , Z, . (See references [1], 
[2], (8}.) 

If uw is a number greater than unity, it is shown that the test of the form “‘re- 
ject the hypothesis if Zi + --- + Zins: > K”’ is consistent against a very wide 
class of alternatives. When u = 2, the resulting test has some desirable proper- 
ties with respect to alternatives with linear density functions. 


2. The distribution of Z,, --- , Z,. It is easily seen that P[Z; = Z; for any 
i ~ jj] is equal to zero. We want to find the joint density function h(a , --- , Zn) 
of Z,,--:-,2Z,. The joint density function of Wi, ---, W, is equal to n! 
f(wi)f(wi + we) «++ f(wr + we + +++ + wa) in the region w; 2 0, w+ +--+ + 
Wn, & 1, and is equal to zero elsewhere. Let {7(1), j(2), --- ,j(m + 1)} be any 
permutation of the first n + 1 integers, and let eon denote summation over all 
the (n + 1)! permutations. Given any set of numbers 0 < 23 <%<--- < 
Zn <1 — (a1 + +++ + 2n), we denote by Q[j(1), 7(2), --- ,7(n + 1)] the con- 
ditional probability that W; = zi fori = 1, --- ,n + 1, given that Z; = 2, 
fori = 1, --- ,n + 1. It is understood that if 7/4) = n + 1, then zx) = 1 — 


Received August 22, 1955; revised June 20, 1956. 
1 Research under contract with the Office of Naval Research. 





1166 LIONEL WEISS 


(4, + --- + 2,). For each set of values z;, --- , 2, for which A(z, , 
positive, we have 


QLj(1), «++ j(n + 1)) = Resa lesan + zen) - + Meio 4 °°" F zi) | 
h(a, -** , 2n) 

Since >» Aj(1), -++ ,j(n + 1)] = 1, for each set of values 2, --- , 2, for 
which h(z , --- , 2n) is positive, we have h(a, --+ , Zn) = n! Ye flzja)f(zsa) + 
25) «++ f(Zjay + +++ + Zim). Now let D be the region in (z,, --- , 2,)-space 
where the following three conditions are satisfied: 

()jOSaSa2S8°'-SuS1-—-(a+--: +2), 

(2) h(ai,--- ,2n) = 0, 

(3) n! Do Slzim)fl2ia - 25(2)) oo flzia aes: 25(n)) > 0. 
Then D must be of measure zero. For if D is of positive measure, then 


f-s] n! Deo Sziaflziay + 25) +++ flzjay + +++ + zim) da +++ den > 0, 


which implies that P[(W,, --- , W,) in D] > 0, which in turn implies that 
P\(Z,, --- , Zn) in D] > 0, which is a contradiction. Therefore we have shown 
thatifO S<2aS285-:--S2a,51-—(a+-:: + 2,), then 


(2.1) Alar, -++ , 20) = 2! Dis flesm)flesay + Zia) «++ Slzin + +++ + zim), 
while h(z,, --- , Zn) is zero for other values of z;, --- , Zn. We note that when 


f(x) = 1, the right-hand side of (2.1) is equal to n!(n + 1)!. 


3. Properties of the power of tests based on Z, , --- , Z,. Let r(x) be a given 
bounded measurable function of x satisfying the conditions 


al 


I Hz) de = 0, | " #2) de > 0. 


Then for 6 small enough in absolute value, 1 + 6ér(x) is a density function on 
(0, 1). For any given measurable region R in (a, --~+ , 2,)-space, we denote by 
M(R, 6) the probability that (Z,, --- , Z,) will fall in R, assuming the density 
of the original observations is equal to 1 + 6r(z). In what follows, we shall al- 
ways assume that F is a subset of the region 


mm S1-(at- 


For any given region R, we have 


dM (R, 6 . n 
(3.1) 2 = Nn. [- ° | : br r(zja) 4+ es of Z5k)) dz “1o- dzn ‘ 
a= () kK 


24 s 
(3.2) d { (R, | = an! |. . | 7. >. r(zja) a cee 25k) 
=() R 


lgkcLsn 


‘r(zsq) + ++: 2j«z)) dz — dzn . 
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Equations (3.1) and (3.2) follow easily when f(x) in (2.1)-is replaced by 1 + 
ér(x), and the result is expressed as a polynomial in 6. 


4. The case of linear r(x). The integrands in (3.1) and (3.2) are complicated 
for many functions r(x). However, when r(x) = x — 3, we have (remembering 
Zn41 = 1 — (& +--+ + 2,)) that the integrand in (3.1) is identically equal to 
zero, While the integrand in (3.2) is equal to 


(4.1) (n ~s ot (i t+ ee) +2443) — ete : 

Suppose we are testing the hypothesis that f(z) = 1 against alternatives of 
the form f(z) = 1 + 6(a — 4), with given level of significance a. We are going 
to consider only tests based on Z,, --- , Z, , so that our critical region will be 
a region in (Z,, --- , Z,)-space. We want to find the critical region R satisfying 
the following three conditions: 


(1) M(R, 0) = a, 


dM (R,8)]  - 
9 eonene = 
(2) dé . 0, 
dM (R, 8) .; 
(3) 2 is a Maximum. 
ds b=0 


In the terminology of [6], this region R would be called an ‘“‘unbiassed critical 
region of type A”’ for testing the hypothesis that 6 = 0. We know that in the 
present case, condition (2) is automatically satisfied by any region R, since the 
integrand in (3.1) is identically zero. But then a very simple application of the 
Neyman-Pearson lemma shows that the desired region R is given by 
Qni{(n — 1)(n + 2) Mei + e+ + figs) /24 — (n — 1)(n + 1)!/12} 
ni(n + 1)! 
where K(qa) is a properly chosen constant. Equivalently, R is given by 
2 2 
Ate + ean2 k(a), 


where k(a) is a properly chosen constant. 


2 K(a), 


5. Consistency of the proposed test. In this section we prove that the test 
described in Section 4 is one of a class of tests, any one of which is consistent 
against a wide class of alternatives. First we need some lemmas. 

Lemma 1. If g(x), the common density of X,, --- , Xn, has at most a finite 
number of discontinuities, and if R,(t) denotes the proportion of the values 


“1, aoe » Zn41 


which are not greater than t/(n + 1), while S(t) denotes 1 — fo g(x) exp {—t g(z)} 


dx, and V(n) denotes sup;>0 |R,(t) — S(t)|, then V(n) converges to zero with proba- 
bility one as n increases. 
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Proor. This is proved in [4]. 

Now we introduce the following notation. Let u be any positive number, while 
Y, shall denote [(n + u + 1)/I'(n + 2)[Zi + +--+ + Zaaij. Let g(x) denote 
the common density of X;, --- , X, , and define J(g; u) as 


rut) f(z) ae. 
{z:9(z)>0] 
J(g; u) may fail to exist (that is, be infinite). 
Lemma 2. If J(g; u) is finite, then given any positive numbers e¢, 5, there is a 
positive integer N(€, 5) such that 


P{Y, > J(g; u) — ¢€ simultaneously for alln > N(e, 5)] > 1 — 6. 


If J(g; u) fails to exist, then given any positive numbers B, 5, there is a positive 
integer M(B, 5) such that 


P{Y, > B simultaneously for alln > M(B, 6)|] > 1 — 6. 
Proor. In the notation of Lemma 1, we have 
Zit: + Zia = (n+ 1) "fe t’ dR, (2), 


and therefore Y, = I'(n + u + 1)/I'(n + 2)(n + 1)’ “ff t* dR, (t). Now we 
choose any positive number 7 and hold it fixed until further notice. We have 
Y, = Tin + u+ 1)/T(n + 2)(n + 1)*“f6 t” dR, (t). As n increases, the co- 
efficient of the integral in this last expression approaches unity, and from now 
on we shall treat it as unity, and it will be seen that this does not affect our con- 
clusion. We have fo ¢t’ dR, (t) = T“R,(T) — ufi t’"R,(t) dt, and by Lemma 1, 
the expression on the right of this equality approaches the following with prob- 
ability one: 


1 : - 
as E - I g(x) exp {—To(zx)} as | —u [ t" dt 
0 “0 


1 T 
+u [ a()4 [ t* exp [—t(z)) at\ dx, 
/0 /0 J 
which equals 


Tg(z) 
aa 
r'~ é dr dz. 
0 


-[ T“g(x) exp [—Tg(x)] dx + u / (g(2)}—" [ 


(z:9(z)>0] 


But by taking T large enough, this last expression can be made arbitrarily close 
to J(g; u) if its exists, or it can be made arbitrarily large if J(g; ~) fails to exist. 
This proves Lemma 2. 

Lemma 3. If the common density of X,, --- , X, is uniform on (0, 1), then Y, 
converges stochastically to T(u + 1) as n increases. 

Proor. This is proved directly from the discussion on page 245 of [5]. 
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Lemma 4. If u > 1, and if g(x) is positive almost everywhere on (0, 1) and dif- 
fers from unity on a subset of (0, 1) of positive measure, then J(g; u) > T(u + 1). 

Proor. For convenience, we omit the limits of integration, which are always 
zero and one throughout this proof. Hélder’s inequality states that if p > 0, 
q > 0, and p + q = pq, then 


| / r(x)s(x) dz | < (/ | r(x) |? ae)” (J | s(x) |* ae)" 


with equality holding if and only if |r(x)\? = K |s(x)|* almost everywhere, where 
K is a constant, and either r(x)s(x) 2 0 almost everywhere or r(x)s(x) S 0 
almost everywhere. Applying this inequality with r(x) = [g(x)|“""'“, s(x) = 
(g(x) °"", p = u/(u — 1), and q = u, the lemma follows immediately. 

THEOREM. Suppose it is known that X, , Xz, +--+ are independently and iden- 
tically distributed, and it 1s desired to test the hypothesis that the common distribution 
is the uniform distribution over (0,1). For a given level of significance, a 
(0 < a < 1), a given number u > 1, and a given positive integer n, let T(a; n; u) 
denote the test of the hypothesis described as follows: Reject the hypothesis if and only 
if at least one of the following occurs: 

(1) At least one of the values X,, --- , Xn falls outside ihe open interval (0, 1), 

(2) X; = X; for some integers i,j withl Si<j Sn, 

(3) Zi + Zt + +++ + Zan 2 Ka; n; u), 
where K(a;n; u) is a constant chosen to give the proper level of significance. Then 
the sequence of tests {T(a;n;u), T(a;n + 1; u), --+} ts consistent against any 
alternative common distribution function G(x) with at least one of the following 
properties: 

(1’) GO) > 0, 

(2’) GQ) < 1, 

(3’) G(x) has at least one positive saltus, 

(4’) G(O) = 0, G1) = 1, G(x) is absolutely continuous with derivative g(x), and 
g(x) differs from unity on a subset of (0, 1) of positive measure and has at most a 
finite number of discontinuities, and a finite number of oscillations. 

Proor. If G(x) has property (1’) or (2’), specification (1) of T(a; n; u) proves 
consistency. If G(x) has property (3’), specification (2) of T(a;n; u) proves con- 
sistency. If property (4’) is possessed by G(x), we distinguish two cases, accord- 
ing to whether or not g(x) is positive almost everywhere on (0, 1). 

Case 1: g(x) is positive almost everywhere on (0, 1). We may express specifica- 
tion (3) of the test in terms of Y, defined above. For large n, Lemma 3 tells us 
that specification (3) of the test is essentially Y, > T'(u + 1). But Lemmas 2 
and 4 guarantee that under G(x) the probability is high that Y, will be greater 
than I'(u + 1) if n is large. 

Case 2: g(x) is zero on a subset of (0, 1) of positive measure. Since g(x) has 
at most a finite number of discontinuities, a point w in the interior of (0, 1) can 
be found such that g(x) is continuous in a neighborhood of w, g(w) = 0, and any 
neighborhood of w contains a set of positive measure on which g(x) = 0. Since 














1170 HARRY WEINGARTEN 


g(x) has a finite number of oscillations, this implies that there is an interval of 
positive length A in the interior of (0,1) on which g(x) = 0. But then the 
largest of the values Z; , --- , Zn4: is certainly no smaller than A; therefore Y, 
is certainly no smaller than A“I'(n + u + 1)/I'(n + 2), and this last expression 
approaches infinity as n increases. 
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ON THE PROBABILITY OF LARGE DEVIATIONS FOR SUMS 
OF BOUNDED CHANCE VARIABLES 


By Harry WEINGARTEN 


Bureau of Ships, Navy Department 


1. Summary. The following theorems are proved. 
THEOREM 1. Jf x, 7%, --- satisfy —1 S x, S a,a S land 


, 


E(2n|%1, °° 


» tai) S —u max (\z,| | 1, --* , Ln-1), 


0 < u < 1, then for any positive t, 


t 
Pr {7, + --- + 2, 2 tfor some n} S 0 


’ 


where 0 is the positive root (other than @ = 1) of 


ort 2, o,it=*s 
1 ——_ ¢" — ¢ +—— = 0. 
(1) a+1 atl 
This choice of 6 1s the best possible. 
TuHeoreM 2. Jf 21,22, °°: satisfy |z,| S land E(z,|%,, +--+: ,2n1) = 0, 
then for all N > 0, 


Pr Pr i io 2.9 a Xn | =e forsome n2= ny < 2%", 
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where gp = (1 + e-) “*?"(1 — &)"°”. This choice of ¢ is, for every € between 0 
and 1, the best possible. 


Both results are improvements of results of Blackwell [1], and the methods of 
proof are somewhat similar. 


2. Procfs of the Theorems. Since the methods for Theorems 1 and 2 are simi- 
lar to those in [1], we merely indicate the main steps. 

For Theorem 1, let &(N, t) be the least upper bound, over all sequences {z,} 
satisfying the hypotheses of Theorem 1, of the probability 


Pr {z, + --- +2, 2 tforsomek S N}; 


in particular &(0, ¢) is 1 fort S 0 and 0 fort > 0. Then 
&(N + 1,2) = U@(N, d), 


where U is the transformation taking Borel-measurable functions of ¢t into Borel- 
measurable functions of t, such that the value of Uf at t is 


sup Ef(t — 2), 
zex 


where X consists of all chance variables satisfying —1 S zt S a and EX S 
—u max |X|. Now if @ satisfies (1), then Ug = g, where g = 6°. Also, f, = fo 
for all t implies Uf; 2 Uf: for all t. Repeated application of this to g = #(0, ¢) 
yields g = ®(N, ¢) for all t, and letting N — ~ completes the proof of Theorem 1. 
To see that this choice is the best possible consider the sequence 2 , 22, --~ in- 
dependent, with the distribution of each z, = a, and —1 with probabilities 
(1 — u)/(a+ 1), (a+ u)/(a + 1) respectively. This sequence satisfies the 
hypotheses of Theorem 1, and it will be shown that 


Pr {z, + --- + 2, = t for some n}"* > 0 


ast— o., 


To do this we consider a game between two players with fortunes, stakes, etc., 
as follows: 








Players P; P, 
Fortunes t b 
et te te eg eee “st 1 
cies ie a+u l1— wu 
Probability of winning a game...... = = 
. gas P a+ 1 q a+ 1 


The probability of the ruin of P; which we are interested in is easily seen to be 


the same as Pr {z; + --- + 2, = t} for the case of a sequence 2 , 22, 


++ 7N- 





dependently and identically distributed with each z, = a, —1 with probabilities 
(1 — u)/(a + 1), (a + u)/(a + 1) respectively. 

Let us approximate a by some rational fraction r/s and then change the units 
in which the game is played. We will have 
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a 
Fortunes 
Stakes 


Probability of winning a game......... 


Using the results of [2], pages 144-146, we obtain 
sb—s+1 ab 
6; — |] < Yor < go“ 6; — 1 


st 
a Ga _ | = grt? 
where 6, is the root of péi™* — 6; + q = 0 and y,, is the probability of the ruin 
of P; when his fortune is st. If the fortune of P, becomes infinite, we have 


Gi sys Hi" 


When we return to the original units of the game, we can state 


(03) < Ye < [ay ore tre 


t t—(r/8) +(1/ 
6. Sy: S 42 rj8)+CU/e) 


> 


where @, is the root of p@.“""** — 63/* + q = 0. 
By choosing r and s large enough, we may come as close as we wish to a, and 
so we may finally write 


6! < Ye < - 


where @ is the root of pé**' — 6° + q = 0. This is possible, since the probability 
of ruin in the game where the stakes are r and s is the general solution of the 
difference equation 


Ye = PYzis + QWYz-r, 


where yz is the probability of ruin of P; when his fortune is x. Such solutions are 
known to be continuous functions of the stakes. That 6 , the root of p@’?** — 
6'* + q = 0, approaches 0, the root of pé**’ — 6* + q = 0, follows from the 
fact that the solution of a polynomial is a continuous function of the coefficients. 
From this we may obtain 


(oy s Pr{xa+--- +2,2t}" s [os] 
and so 
Pr{m+---+a2,2t}'—56 as too, 
As a matter of fact we note that @ is really a lower bound, since 
ro" > «. 


Hence @ is best possible. 
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For the proof of Theorem 2 we have: 


Pr atts =e forsome n= w\ 


< Pr {m, — ke + --- +2, — ke = Ne(l — k) for some n} 
< [a+ grr?.a— gen’, 
where the last inequality is obtained by applying Theorem 1 to the sequence 


mF tn — ke 
Yn 1+ ke 
Here we have taken 


=e 
1 + ke 


ke 


. Tt he 


and u 


and found 


Nel — k) e(1—k)/(1-+e6) 
— > ee ee 
Pr {u + + Yn = 1 + ke \ s @” , 


where @ is the root of 


(2) agiates a gi kolGtke) + } = 0, 


To find the smallest value of 6**°~*’°**® we may procede in the following man- 
ner: Beginning with (2), we write 


(3) 7 GFP Miewa(e Ma 2) +]1= 0, 


and solving for ke, we find 


_ log 6 + log (2 — 8) 
(4) _ log @ — log (2 — 6)" 


Giving ke the value from (4), we find that R = 0%*°-*/@**9 gives 


. —[(1—«) log @ - ¢ 
(5) R af [(i—e) log 64 (1+€) log (2—0@)]/2 log 


If we take logarithms in both sides of (5) and simplify, we may rewrite (5) to 
obtain 


(6) R'*™ a gone sc oe 
and it is very easy to find the value of @ which makes R a minimum to be 
(7) 6é=l1—e« 


The same inequality holds for 


pr(at + & s - eforn z wh, 


n 


and hence Theorem 2 is proved. 
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To see that ¢ is the best possible, consider the case of the sequence {z,} in- 
dependently distributed, each taking the values +1 with probabilities 4, 3. 
It follows from a result of Chernoff [3] that 


Pr {ai + «++ + an 2 ne}""— 
so that our ¢ is exact. 


REFERENCES 
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I 


A REMARK ON THE ROOTS OF THE MAXIMUM LIKELIHOOD 
EQUATION 


By C. Krart anp L. LeCam! 
University of California, Berkeley 


1. Introduction and summary. The statistical literature combines two types 
of investigations concerning the consistency of maximum likelihood (M.L.) 
estimates. A few of these, such as the most excellent paper of A. Wald [1], do 
prove directly the consistency of M.L. estimates. However, most investigators 
seem to have concentrated their efforts on proving the existence and consistency 
of suitably selected roots of the successive likelihood equations. Some authors, 
see [2], for example, add the supplementary remark that such consistent roots 
will eventually be unique in suitably small neighborhoods of the true value and 
will achieve a local maximum. 

It is the purpose of the present note to point out by means of examples that 
this second mode of attack is not adequate. In the examples given below, the 
“usual regularity conditions” of Cramér [3] or Wald [4] are satisfied, but the 
M.L. estimates are not consistent. It should also be pointed out that the direct 
proofs of existence of roots, simple in the case of a unidimensional parameter, 
become unwieldy in more than one dimension. On the other hand, if one has 
proved the consistency of the M.L. estimates, the existence of roots follows 
trivially from the fact that when a differentiable function reaches its maximum 
in an open set, the derivatives vanish at that point. 


2. Examples with independent identically distributed variables. The first 
example given below has the following characteristics: 

Received September 21) 1955. 

! This paper was prepared with the support of the Office of Ordnance Research, U. 8. 
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(1) Cramér’s conditions are satisfied and the condition of identifiability is 
satisfied. 

(2) The likelihood equation has roots. 

(3) The M.L. estimate does not exist, except maybe for sets of sample points 
of measure zero. 

(4) There exist consistent estimates. (See Section 3, below.) 

For every nonnegative integer k, let A, be the open interval 


A; = (2k, 2k + 1). 


Let © = UP_, A;, and let {a}, k = 0, 1, 2, --- , be an arbitrary ordering of 
the rationals of the interval (1, 2). Define p(@) = a, if de A;,. For each 6 ¢ 0, 
let the vector (X, Y) have a normal distribution with E(X | @¢) = p(@) cos 270 
and E(Y | @) = p(@) sin 276, and covariance matrix the identity. If {(X;, Y,)}, 
t= 1,2,--+,mn---, is a sequence of independent random vectors with the 
distribution of (X, Y), the logarithm of the probability density of the first n 
observation is given by 


—2 log p, = K, + n[X, — p(0) cos 276} + n[Y, — (0) sin 2r6]’, 


where (X, , Y,) is the sample mean. 
Defining r, > 0 and ¢,, 0 S go <1, by X, = r, cos 2my, and Y, = r, 
sin 2ry,, the above equation can also be written as 


—2 log pa = n log 24 + [rn — p(0))’ + 2rnp(6)[1 — cos 2x(@ — ¢,)]. 


Accordingly, the likelihood equation is r, sin 2x(@ — ¢,) == 0. Therefore, all 
values of the form @ = ¢, + k/2 which belong to 9 are solutions of the likeli- 
hood equation. However, if r, is not rational, the M.L. estimate does not exist, 
since p(@) can be chosen close to r, but not equal to it. 

One could define approximate maximum likelihood estimates as follows. Let 
{e,} be a sequence of positive numbers tending to zero. For each n, let S, be the 
set of values of @ such that 


Sup Pali , 22, -++ an] t) S (1+ enlpalti, >> , tn | 8). 
te 


Since every interval, however small, contains an infinity of rationals, for every 
é, > 0, the set S, will, in our example, have elements in common with an in- 
finite number of the intervals A, , and therefore the sequence {S,} cannot con- 
verge to a point. 

One might object to the preceding example for two reasons. In the usual 
proofs of ‘‘consistency”’ of roots of the M.L. equation, it is assumed that the 
random variables are real-valued. However, this assumption is irrelevant to the 
proofs given, so that the bivariate character of the example is no detraction. It 
is, of course, possible to build analogous univariate examples. 

Another feature to which objections can be raised is the nonexistence of the 
M.L. estimate. This is also irrelevant, as shown by the next example, which 
possesses the following characteristics: 
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(1) Cramér’s conditions and the condition of identifiability are satisfied. 

(2) The likelihood equation has roots. 

(3) With probability tending to unity, the maximum likelihood estimate 
exists, is unique, and is a root of the likelihood equation. 

(4) The MLL. estimate is not consistent. 

(5) There exist consistent estimates. 

Let © be U, A, as in the first example, and let {a,} be an ordering of the ra- 
tionals of the interval (0, 1). Let p(6) = a, if @¢ A, and let (X;, Y;, Z;) be 
multinomially distributed with probabilities p: = p(é@) cos” 298, p, = p(9) sin’ 
2x0, and p; = 1 — p(@). For n independent observations, the likelihood func- 
tion is 

log Pn = ™% log pi + ne log pe + 1s log ps + f(m, m, Ns), 
where m = oie Xi, m = Dif Yi, ms = Dota Z:, and f is a function 
which does not depend on @. Again, the likelihood equation has solutions of the 
form 276 = tan '+/n./n. Since the density is maximized by taking p; = n/n, 
if this is possible, only one of these solutions is the M.L. estimate. With prob- 
ability tending to unity, the M.L. estimate 6, is such that 1 — p(6,) = n;/n. 
For 6, to be consistent, it must eventually stay in a fixed interval A; so that 


n;/n = 1 — a, but the probability of this equality tends to zero as n tends to 
infinity. 


3. Existence of consistent estimates. In the discussion of the first and second 
examples given above, it is stated that there exist consistent estimates. This 
follows from the lemma stated in the present section. 

Consider a situation where the following assumptions are satisfied: 

(1) Observations are made on a sequence of independent identically dis- 
tributed variables {X,},n = 1, 2,--+-, taking their values in a Euclidean 
space &. 

(2) The parameter space 0 is a measurable subset of a Euclidean space. 

(3) To each 6 € 6 there corresponds a measure P, on &, and the distribution 
of the sequence {X,} is the product measure corresponding to a Ps of the family 
{Pe ; Oe 8} ‘ 

(4) Po, = Pe, implies 6, = 4. 

(5) 9 is a locally compact subset of a Euclidean space and the map @ — P, 
is continuous in the sense that if 6, — 0, then Ps», — Pe, for Paul Lévy’s dis- 
tance. 

One can easily obtain the following proposition: 

Lemma 1. Let assumptions (1) to (5) be satisfied. Then there exists a sequence 
{Tn} of estimates such that for every positive « and every compact set K C © the 
quantity 


sup P[| T, — 6| > «| 4) 
@eK 


tends to zero as n tends to infinity. 
The proof of this lemma has been given elsewhere, see [5]. It entails that 
Cramér’s conditions in [3], when supplemented by (4), above, imply the existence 
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of consistent estimates. Even in the simple case described by assumptions (1) 
to (4), the problem of finding necessary and sufficient conditions for the existence 
of consistent estimates has not been solved. Partial results have been obtained 
by C. Stein [6] and Doob [7]. 


4. Independent, not identically distributed, variables. In the case of inde- 
pendent identically distributed variables, the lemma given in Section 3 ensures 
the existence of consistent estimates in a wide variety of circumstances. If the 
variables are not identically distributed, much more freedom is available, as 
indicated by the next example which possesses the following characteristics: 

(1) Wald’s conditions [4] and the condition of identifiability are satisfied. 

(2) There does not exist any consistent estimate. 

Let © be the open interval (0, 37). For each @, let X2; be normal with mean 
cos a, and variance 1 and let X>2;,; be normal with mean sin @ and variance 1. 
It can be verified that if a; tends to unity, Wald’s conditions for the existence of 
consistent roots are satisfied. However, a necessary condition for the existence 
of consistent estimates given in [8] is not always satisfied. In the present case, 
this condition would require that for any two values 6, , 6, the quantity 


n(sin 6; — sin 6) + > [cos a6, — cos a,62)° 


i=1 


increases to infinity. If 6 is taken equal to 6; + 2” and the a,’s tend to unity 
sufficiently fast, this condition is not satisfied, although the condition of identi- 
fiability can readily be satisfied. 

It should be noted that the above is not contradictory to Wald’s assertion 
that there is a sequence of roots which converge to the true parameter value. 
However there can be, as in this example, more than one limit point to the set 
of all roots. There is no consistent estimate because it cannot be determined from 
the sample values alone which convergent subsequences of the roots are the 
appropriate ones. 
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LIONEL WEISS 

JN THE UNIQUENESS OF WALD SEQUENTIAL TESTS' 
By Lionet Wetss 
University of Oregon 


1. Summary. Under certain mild restrictions on the distributions involved, 
it is shown that the probabilities of the two types of error uniquely determine 
the two bounds characterizing the Wald sequential probability ratio test. 


2. Introduction. X, , Xz, --~- is an infinite sequence of independent and iden- 
tically distributed chance variables. The density of X, is f;(x) under H; , where 
t = 1, 2. We assume that under either H, or H, the chance variable f2(X,1)/f,(X1) 
has a distribution which assigns a positive probability to any nondegenerate 
interval in the interval [0, ©], and zero probability to any point in that interval. 

B, A shall denote the stopping bounds characterizing the usual Wald sequen- 
tial probability ratio test. As usual, B < A. Q,{R; T] shall denote the probability 
under H; that the value of the final probability ratio is in the region R, when 
the sequential stopping rule is to stop the first time the value of the probability 
ratio is in 7, and not before; u(z) shall denote the set of numbers less than or 
equal to z; v(z) shall denote the set of numbers greater than or equai to z. The 
union of any two sets R and T' shall be denoted by R + 7. We note the following 
easily proved inequality for future reference: if b, a are any two finite positive 
numbers with b < a, then 


Q.[u(b); u(b) + v(a)] < b-Q,[u(b); u(b) + v(a)]. 


In what follows, 6; , 62, --+ shall be numbers between zero and one. 

For any given B, A, we denote by a(B, A) the probability of accepting H, 
when H, is true when using the Wald test with bounds B, A; while 8(B, A) de- 
notes the probability of accepting H,; when H; is true and the Wald test with 
bounds B, A is used. 


3. Proof of uniqueness. Let a, 6 be two given numbers between zero and one, 
such that the equalities a(B, A) = a and 6(B, A) = 8 imply the strict inequali- 
ties 0 < B < A < ~. Then we have: 

TuHeoreM. There is at most one solution to the equations a(B, A) = a, B(B, A) = 
B, the unknowns being B, A. 

Proor. We assume that there is at least one solution to these equations. Let 
B be any number for which it is possible to find an A greater than B with 


a(B, A) = a. 


Received August 29, 1955. 
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Fixing B, we shall show that the equation a(B, A) = a is satisfied for exactly 
one value of A. This is so because for a fixed B, a(B, A) is a strictly decreasing 
function of A under the assumptions made above. We denote the value of A 
satisfying a(B, A) = a by A(B). It is easily seen that A(B) is a strictly decreas- 
ing and continuous function of B, and the set of all B for which A(B) exists is an 
interval. 

From now on, we shall denote 8(B, A(B)) by 8(B). Our next step is to show 
that 8(B) is a strictly increasing function of B, and this will complete the proof 
of the theorem. For a given B, we can find a positive AB so small that B + AB < 
A(B + AB). We denote A(B) — A(B + AB) by AA. We denote by R the set 
of numbers no greater than B, by S the set of numbers between B and B + AB, 
by 7 the set of numbers between A(B) — AA and A(B), by U the set of num- 
bers greater than A(B), and finally we denote the set R + S + T + U by V. 
We have the following relationships, where z is the variable of integration: 


ap) = airsvit+ [~~ a[u(2) su (2) + (4) |dastuev) 


+ £2.0L+@)@)+0()}-amen 


(3.1) 


6(B + AB) = Q{[R; V] + QS; V]. 


Then we get 


6(B + AB) — 8(B) = Q.1S; V) 


-fef+@)@)+-(4) anon 
22a +(8)oe(2)+o(2)] e000 


Also, we have that the expression we get by replacing the subscripts 2 in the 
right-hand side of (3.1) by the subscripts 1 is equal to 1 — a, as is the expression 
on the right-hand side of (3.2) when the same change of subscripts is made. Then 
we get by subtraction pee 
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Q[S;V] = e of (2) 5 u (2) +v (4)| dQi{u(z); V] 
z Ss ™ [ (2) ee (2) ry (4)| dQlu(z); V). 


Using some obvious continuity properties of the function Q; , we get 


(3.5) QAS; V] = (B + 6,4B)-Q,[S; V], 


(3.4) 


and combining (3.3), (3.4), and (3.5), we get 
8(B + AB) — 8(B) 


= (B + 0,AB) ‘eo Q [ (2) ;u (2) + v (4)| dQ,{u(z); V] 
gy B+ 0am f afu(Z) su(2)+0(4) Jaane 1 
in r aa| u (2) 5 U (2) +v (4)| dQ.[u(z); V] 
5 ea * [ (7) re (2) re (4)| dQ,{u(z); V). 


Again using continuity properties of Q, , we can write (3.6) as follows: 


B(B + AB) — B(B) 
= (B + 6AB)-Q, [ (saa) a (sass) 
ne oa) QS; V] 
+ (B + 6AB)-Q |» (ey esa) “ (aay aaa) 
‘a =) QUT; V) 
- o[+(eaas) «(ass) * (4a) 08-7 
~ fay 2a) (ays) 


- (azayaxa) | "Gat; VI. 


But Q.[S;V] = (B + 64B)-Q{S; V], while Q.[7; V] = (A(B) — 6-AA)- 
Q,(T; V], and using these relationships in (3.7) we get: 





UNIQUENESS OF TESTS 


B(B + AB) — B(B) 
(8 + 94B)-ais:¥1- {0 [« (g=eap) + (a eae) 
ios (r‘us)| 


Tl. . f ¥ lua, ; 
+ Git; Vi- \ 8 + GAB)-C, lu (a - saa) 


A(B) — 0,AA A(B) — 6,4A 


— (A(B) — @AA)-Q2 | (gp ae) 


' (saa) a (ata) | 


Recalling that 


Qs [ ma u(1) +0 (4)| <Q | wa u(1) + »(4)| , 
o-[» (zt) « (atm) + * Ca) 


‘ iw @ E (ats) ” (atm) _ (ats) |: 


and from continuity considerations on Q; and Qz , it follows that each of the two 
expressions in braces in (3.8) becomes positive for small enough AB. This proves 
that 8(B) is strictly increasing in B, and completes the proof of the theorem. 


4. Extensions. All the results above go through in the same way under the 
following somewhat less restrictive conditions: Under either H, or H., the 
chance variable f.(X1)/f:(X1) has a continuous distribution which assigns a 
positive probability to any nondegenerate subinterval of [(C, D], whereO S$ C < 
1 < D; and the equalities a(B, A) = a and 8(B, A) = 8 imply the strict 
inequalities C << B< A < D. 
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A NOTE ON BHATTACHARYYA BOUNDS FOR THE NEGATIVE 
BINOMIAL DISTRIBUTION 


By V. N. Murty 


University of North Carolina 


In the lecture notes of Professor Lehmann on the theory of estimation [1], 
the first two Bhattacharyya lower bounds for the variance of an unbiased esti- 
mate of p for the negative binomial have been calculated. It is of some interest 
to know how the successive bounds turn out, and whether they tend to pq, 
which we know to be attainable. The object of the present note is to give an 
explicit expression for the k-th lower bound and show that it tends to pq. 

If X has a negative binomial distribution, then we know that 


(1) P(X = 2) = gp’ xz =0,1,2,--- 
where g = 1 — p..Let 
p"P(z) 
2 Ly See 
(2) ; Op” 
Then it is easily verified that 


ia py (—1)" xo 
(3) S, = (—1)"X Ss 1)" nX 


q” —* 


’ 
where 

X™ = x(x — 1)--- (x — m+ 1). 
Therefore, 


SnSn = oF ion - (4) mx“ YxX™ — (4) — ee 
oo Pp p 


2 
+ (2) maxx | 
P 


(4) 


It is well known that 

(5) E(x’”] = m! (4) . 
p 

and we have the algebraic identity 


(6) xmym = > axe 


r=0 r 


( f \om 
where n” = n(n — 1) --- (n — r + 1). Therefore, 


m n+m—r 
(7) E(x’ x”) = OR (”) n(n + m—r)! (4) ; 


r=() 
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Using (7) in (4), after some simplification we have 


(8) E(SaSn) = i. Sl at em —1— 2)! (mn — r) (” "\ oe ‘ 


q™p ntm cae (n om r)! 
Let Amn = E(S,S,). Putting m = 1, 2, 3, and 4 in (8), we have in particular, 


n+l 
Ain = — = eee as 


n+2 ' 
ag os SO Slat tem 


ht 
. (<-)™ 
a 

( om 1 - +4 


Na = “ar - n! [24q° + 72q°(n — 1) + 36q(n — 1)(n — 2) 


160" + 12q(n —1) + 3n(n — 1) + 6), 


+ 4n(n + 1)(n — 7) + 24(8n — 1]. 
The k-th lower bound is given by 


| Ave Nog - °° Xs os Den °° 
(10) Ly «je hes *°° a hee *° 
bia Nes * °° 


To evaluate the denominator, we multiply the first row by 2/p and add the sec- 

ond row to it; the second row is multiplied by 3/p, and the third row is added 

to it; and so on. A successive application of this procedure will reduce the deter- 

minant to a triangular one, the value of which is easily computed. We thus have, 
| 


lie _ {k! (k —1)!--- 117 


nk(k+1)/2yk(R+1) 
¢ P 


ot id ee +0 te Sk 


k24k—2/2 k2+k—2 
q'*+ [ak + 


From (11) we have 
Le ie paq + a “4. aed + 1). 
Therefore, 


lim Ly = pq/(l—q) = 
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ABSTRACTS OF PAPERS 


(Additional abstracts of papers presented at the Detroit meeting of the Institute, 
September 7-10, 1956) 


1. Further Applications of Information Theory to Multivariate Analysis and 
Statistical Inference, (Preliminary Report), Morton KupprerMAN, The 
George Washington University, (By Title). 


A generalized statistic based on the Kullback-Leibler measure of information is defined 


6* 
2nI* = 2n [sa loge dx(z), 
J\Z, %, 


where the vector 6* of h components is any consistent, asymptotically normal, efficient 
estimator and 49 is specified. 2nJ* is used to test the hypothesis Ho: The sample is from a 
specified multivariate multiparameter population (not necessarily normal). The asymp- 
totic distribution of 2nJ* under Ho is chi-square with h d.f. I* is modified to test the hy- 
pothesis Ho:r (22) samples are from the same general multivariate population, parameters 
not specified; its asymptotic distribution under Hp is chi-square with (r — 1)h d.f. Cor- 
responding results are obtained for divergence-statistics based on the divergence J (1, 2). 
Large-sample distributions of /* and J/* under alternative hypotheses are approximated by 
noncentral chi-square distributions. 

For any multivariate multiparameter distribution admitting sufficient statistics, —log\ 
= [, where ) is the likelihood-ratio criterion and I uses maximum-likelihood estimators. 

Information theory is applied to hypothesis testing, Pearson’s chi-square test of good- 
ness of fit, and the derivation of exact sampling distributions of sufficient statistics. It is 
shown that the set of sufficient estimators of population parameters appearing explicitly in 
any Koopman-Pitman distribution (admitting sufficient statistics) are distributed joint)v 
in a Koopman-Pitman distribution. (Received July 19, 1956.) 


2. Generalization of Thompson’s Distribution, ANDRE G. LAuRENT, Michigan 
State University. 


Generalization of Thompson’s Distribution. 1.1.) Let X = (X,,--- ,Xy) be N(m, oa) dis 
tributed, ¥ = 2Xi/N, s? = =(X; — X)2/N andt = (X; — X)/s. It is well known (W. R. 
Thompson. 1935) that #?/(N — 1) is Incomplete Beta distributed and that this distribution 
is also the conditional distribution of any X; , given X and s. Three generalizations of that 
result are presented. 1.2.) If — = (1, ++: , &)’ is a subsample from X, the p.d-f. of t = 
(¢ — X)/s is [1 — t(8i;/N + 1/N(N — k))t'\4-*-9 2 PE(N — 1)/2]/*8T[(N — k — 1)/2] 
N(k-)/2(N — k)/2, This provides also the conditional distribution of &, given X and s. 2.1.) 
If a vector X = (X’, --- , X”) is N(m, Z) distributed and if t = (¢’, --- , £?) is any observa- 
tion from a sample (X) = (X,,°-+ , Xy) ‘with mean m’ and covariance matrix S, the 
cond. p.d.f. of £, given m’ and S, is [1 — ( — m’)S-'(— — m’)’}(%—?-9!2 | § |-12((N — 1)x]}2 
r[(N — 1)/2]/r[(N — 1 — p)/2] 2.2.) The latter result is generalized to the case where a 
subsample (£) = (f:,--: , &)’ is drawn from (X). The conditional distribution of (¢), 
given m’ and S, is a generalized multivariate Incomplete Beta distribution. These results 
make it possible to obtain and study the U.M.V. unbiased estimates of functions of the 
populations parameters, with obvious applications in the fields of 8.Q.C., bombing prob- 
lems, etc., and tolerance regions investigations. (Received July 23, 1956.) 
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3. A New Class of PBIB Designs, Date M. Mesner, Purdue University and 
Michigan State University. 


In partially balanced incomplete block (PBIB) designs of Latin square type with g con- 
straints and n? treatments, algebraic expressions in the integers n and g give values of the 
parameters n; and pjx if n and g are positive. Some negative values of n and g lead to nega- 
tive parameter values which cannot occur in a design, but others give non-negative values 
which differ from those for any known designs and suggest the existence of a new type of 
design with n? treatments. Five families of the new designs, referrec to here as the ‘‘nega- 
tive Latin square’’ type, are constructed, based on association schemes with 16, 64, 81 

two schemes) and 100 treatments. Necessary and sufficient conditions are first given that 

associate classes in an association scheme may be combined to give a scheme with fewer 
classes. For n? equal to 16, 64, or 81, the new schemes are constructed by combining classes 
in association schemes having n — 1 classes of n + 1 treatments each, constructed from the 
field GF(n?) by a method similar to that of Sprott [Can. J. Math., Vol. 7 (1955), 369-381]. 
The scheme with 100 treatments is found by an enumeration method. (Received July 23, 
1956.) 


4. Some Results for Inverting Patterned Matrices, A. E. Saruan, Egyptian 
Medical Research Laboratories, Cairo, Egypt, and B. G. GREENBERG, 
University of North Carolina, (By Title). 


Employing the results given in a paper by Ukita ‘‘On the Characterization of Diagonal 
Matrix of 2-Type and its Application to Order Statistics,’’ generalizations can be made for 
various cases given in the paper by Roy and Sarhan ‘On Inverting a Class of Patterned 
Matrices,’’ Biometrika, June 1956. In addition, generalizations for getting the inverse of 
complex matrices which can be partitioned into submatrices of the form [Da + J] are 
obtained. Matrices of this nature are frequently encountered in varied statistical applica- 
tions, involving least squares, experimental design and order statistics. (Received July 23, 
1956.) 


5. On the Solution of the Functional Equation of Farrell’s Market, A. CHARNES 
and O. P. AGGARWAL, Purdue University. 


A. Charnes and M. Farrell in June 1953 derived a la R. Bellman a functional equation 
describing the optimal pricing strategy of a seller in a recurring market of Markovian re- 
action characteristics of ‘‘kinky oligopoly”’ type suggested by Farrell (Econometrica Vol. 
22, No. 3, July 1954) in his approximate im kleinen analysis of an econometric controversy 
on optimality of holding price constant. They established (unpublished) existence and 
uniqueness of continuous solutions of at most exponential growth for this functional equa- 
tion, a type still not subsumed in any yet treated (cf. S. Karlin Naval Research Logistics 
Quarterly Vol. 2, No. 4, Dec. 1955). 

This paper develops explicitly the exact solution for all relevant parameter ranges. Hence 
(1) a new method of approximately solving ‘“‘extremal”’ functional equations by employing 
piece-wise quadratic ‘forcing’? terms is suggested and partially implemented, (2) an im 
grossen resolution of the econometric controversy is at hand. (Received August 29, 1956.) 


6. Randomization and Experimentation, W. J. Youpren, National Bureau 
of Standards. 


Randomization, often specified as an indispensable requirement in experimental design, 
is required only when the order or position of the experimental unit influences the per- 
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formance of the unit. Randomization, when required, may give an arrangement that is 
obviously undesirable and one that may doom the particular experimental program. A 
system of constrained randomization is proposed that eliminates the undesirable arrange- 


ments without sacrificing the customary gains achieved by randomization. (Received 
September 14, 1956.) 


STATISTICAL RESEARCH MONOGRAPHS 


The Institute of Mathematical Statistics and the University of Chicago an- 
nounce the establishment and cosponsorship of a series of publications to be 
ralled Statistical Research Monographs. 

The primary purpose of this series is to provide a medium of publication for 
material of interest to statisticians that is not ordinarily provided for by existing 
media. It will help fill the gap between journal articles and textbooks or treatises. 
Among the kinds of publications envisaged are 


New research results too lengthy for the usual journal article. In particular, 
authors will have ample scope for detailed exposition of their findings. 


Research results of interest in both theoretical and applied statistics. At 
present authors of such material frequently find it necessary to publish part 
of their results in a theoretical journal and part in an applied journal. 


Expository monographs in particular areas of statistics. 


Discussions of statistical problems and techniques in particular areas of 
application. 


Every attempt will be made to maintain the highest standards of scholarship. 

The members of the Editorial Board are David Blackwell, William G. Cochran, 
Henry E. Daniels, Wassily Hoeffding, Jack C. Kiefer, and William H. Kruskal 
(chairman). The Editor of the Annals of Mathematical Staiistics is an ex-officio 
member of the Editorial Board. Members of the Editorial Board are selected by 
the President of the Institute and by the Committee on Statistics of the Uni- 
versity of Chicago, subject to the approval of the Institute’s Council and of the 
University’s Board of Publications. The Editorial Board will act as a group in its 
decisions on manuscripts. 

Institute members will receive a one-third discount on prepublication orders 
for monographs of the Series wheri such orders are placed through the Treasurer 
of the Institute. A smaller discount will apply to postpublication orders. 

Authors are invited to send manuscripts and correspondence concerning the 
Series to William H. Kruskal, 127 Eckhart Hall, University of Chicago, Chicago 
37, Illinois, or to any of the other members of the Editorial Board. 
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NEWS AND NOTICES 
JOHN WISHART 

Dr. John Wishart, director of the Statistical Laboratory, University of Cam- 
bridge, died in a swimming accident at the Port of Acapulco, Mexico, on July 
14, 1956. As a tribute to Dr. Wishart, the Institute of Mathematical Statistics, 
the American Statistical Association, and the Biometric Society sponsored a 
memorial session on September 9, 1956, at the Detroit meetings. Professor 
Harold Hotelling delivered an address, “Contributions of John Wishart to 
Statistics.” 


or 


Readers are invited to submit to the Secretary of the Institute news items of interest 


Personal Items 


Sidney Addelman, who received a Master of Arts degree in statistics from the 
University of Delaware in June, has accepted a research assistantship from Iowa 
State College where he will continue his studies toward a Ph.D. degree in sta- 
tistics. 

F. J. Anscombe has been appointed Associate Professor in the Department of 
Mathematics, Princeton University. 

Professor George Allen Baker of the University of California, Davis, served as 
Faculty Research Lecturer for 1955-56. He delivered the fourteenth annual 
Faculty Research Lecture, on the topic “Search for Structure.” 

Charles A. Bicking has accepted the position of Manager, Quality Control 
Branch, Research and Development Division, The Carborundum Company, 
Niagara Falls, New York. His responsibilities will include the establishment of 
process quality control in the manufacturing divisions of the company, assessment 
of product quality, design of experiments in research and development, and 
operations research. 

Richard 8. Bingham, Jr., has joined the newly organized Quality Control 
Branch, Research and Development Division, The Carborundum Company, as 
Senior Engineer. 

Dr. Archie Blake, formerly an Advisory Engineer with Westinghouse Electric 
Corporation, Baltimore, has accepted an appointment as Systems Staff Mathe- 
matician with Bendix Aviation Corporation in Detroit. 

Dr. Ernest E. Blanche is now president and senior research scientist of Ernest 
E. Blanche and Associates. The firm has offices at Rockville and Silver Spring, 
Maryland. 

Robert C. Burton, mathematician in the Statistical Engineering Laboratory 
of the National Bureau of Standards, has been awarded $300 in recognition of 
contributions to the theory of experimental design. The award was made possible 
by the “Special Act or Service” category of the Incentive Awards Act. Mr. 
Burton is now on educational leave from NBS, doing graduate work at the 
University of North Carolina. 
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Richard G. Cornell, who received his Ph.D. degree in statistics from Virginia 
Polytechnic Institute in June, is now a statistician in the Commissioned Corps 
of the Public Health Service. He is stationed at the Communicable Disease 
Center, Atlanta, Georgia. 

Assistant Professor Louis J. Cote has left Purdue University to take a position 
at Syracuse University. 

Donald P. Gaver, Jr., received the Ph.D. degree in mathematics from Princeton 
University in June, 1956. He is now employed by the Westinghouse Research 
Laboratory, Pittsburgh 35, Pennsylvania. 

Dr. S. G. Ghurye has accepted an assistant professorship with the Committee 
on Statistics, University of Chicago, commencing in October, 1956. 

William Gomberg, director of the Management Engineering Department of 
the International Ladies’ Garment Workers’ Union and adjunct professor of 
industrial engineering at Columbia University, has been appointed professor of 
industrial engineering at Washingtoa University. 

Shanti 8. Gupta has received his Ph.D. degree in statistics from the University 
of North Carolina and is now working with the Statistical Group of the Bell 
Telephone Laboratories, Allentown, Pennsylvania. 

Dr. G. C. Helmstadter, formerly an Associate in Research of the Educational 
Testing Service, Princeton, New Jersey, has joined the faculty of Colorado 
A and M College as Psychometrist in the Office of Student Affairs and Assistant 
Professor in the Department of Psychology and Education. 

Robert G. Hoffmann has been appointed as Statistician for the newly organized 
Commission on Professional and Hospital Activities, Inc. 

W. H. Horton is now Manager of the Experimental Design and Statistical 
Analysis Section of the Materials Engineering Department, Westinghouse 
Electric Corporation. 

Dr. Stanley Isaacson has resigned his position as a Senior Statistician with 
the Semiconductor Department of Westinghouse Electric Corporation in order 
to accept a position as Sales Manager with Gendler Stone Products Company, 
Des Moines, Iowa. He has also been appointed Lecturer in Statistics in the Com- 
munity College of Drake University. 

Eugene Lukacs has resigned from the Office of Naval Research to accept am 
appointment as professor of mathematics at the Catholic University of America. 

John H. MacKay (Ph.D., University of North Carolina, 1956) has accepted a 
position as associate professor in the School of Industrial Engineering, Georgia 
Institute of Technology, Atlanta, where he will also do statistical work in the 
Engineering Experiment Station. 

W. G. Madow of the University of Illinois will be at the Center for Advanced 
Study in the Behavioral Sciences, Stanford, California, during 1956-57. 

Margaret Martin has returned to her position as Associate Professor of Bio- 
statistics in the Department of Preventive Medicine and Public Health at 
Vanderbilt University after spending the academic year studying with the 
Committee on Statistics at the University of Chicago. 





- 
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Dr. Irwin Miller, who received his Ph.D. degree in statistics at Virginia Poly- 
technic Institute, is now with the Applied Research Laboratory, United States 
Steel Corporation, in Monroeville, Pennsylvania. 

Joseph M. Moser is now teaching at St. Louis University while working on a 
Ph.D. degree. 

Dr. Mervin E. Muller of the Department of Mathematics, Cornell University, 
has accepted a position with the Scientific Computing Center of the International 
Business Machines Corporation, New York, New York. 

Lt. L. M. Noel, USN, has been transferred from Princeton University to duty 
as executive officer of the U.S.S. Adroit operating out of Charleston, South 
Carolina. 

Jean Roberts, former Director of Public Health Records and Statistics of the 
Minneapolis Health Department, is now Assistant Chief, Division of Research 
and Special Studies, Office of Vocational Rehabilitation, Department of Health, 
Education and Welfare. 

Daniel E. Sands has resigned from the position of statistician with the Squibb 
Institute for Medical Research to accept the position of statistician with the 
Applied Research Laboratory, United States Steel Corporation, Monroeville, 
Pennsylvania. 

Professor Morris Skibinsky, of Purdue University, will spend the 1956-57 
academic year at Michigan State University as Visiting Assistant Professor. 

Arthur Stein, formerly with the Ordnance Communication Command at 
Joliet, Illinois, is now a Principal Research Engineer with the Cornell Aero- 
nautical Laboratory in Buffalo, New York. 

Zenon Szatrowski, Chairman of the Statistics Department, School of Business 
Administration, University of Buffalo, is on leave for the period 1955-1957. His 
present position is that of Staff Consultant at the Scientific Computing Center, 
International Business Machines Corporation. He is working on the application 
of electronic computers to statistical probelms. 

Robert J. Taylor has accepted a position as Mathematical Statistician in the 
Biometry Section, National Cancer Institute, NIH, Bethesda, Maryland. 

Dr. John E. Walsh is now with the Military Operations Research Division, 
Lockheed Aircraft Corporation, Burbank, California. 

Oscar Wesler, formerly Acting Assistant Professor of Statistics at Stanford 
University after receiving his Ph.D. degree in mathematical statistics, has been 
appointed Assistant Professor of Mathematics at the University of Michigan. 

John W. Wilkinson received his Ph.D. degree in statistics in June, 1956 from 
the University of North Carolina and has accepted a position as Assistant 
Professor of Mathematics at Queen’s University, Kingston, Ontario, Canada. 

David M. G. Wishart has been appointed Lecturer in Mathematical Statistics 
at the University of Aberdeen, Scotland, and will take up his duties on July 1, 
1956. 





1190 NEWS AND NOTICES 


New Members 


The following persons have heen elected to membership in the Institute 
May 16, 1956 to August 15, 1956 


Arnaiz Vellando, Gonzalo, Ph.D. (Universidad de Madrid), Profesor adjunto de la Facultad 
de Ciencias Econémicas, Universidad de Madrid, Amnistia, 12.—Madrid, Spain. 
Bailey, J. H., B.S. (University of Rhode Island), Graduate Assistant, Mathematics Depart- 

ment, University of Utah, 1227 East Third South, Salt Lake City, Utah. 

Barlow, Richard Eugene, M.A. (University of Oregon), Graduate Assistant, Department of 
Mathematics, University of Washington, Seattle 5, Washington. 

Béjar, Juan, D.C.M. (Universidad de Madrid), Profesor de Méthodos Estadisticos, Escuela 
de Estadistica de la Universidad de Madrid, San Bernardo No. 49, Madrid, Spain 

BeneS, Vaclav Edvard, Ph.D. (Princeton University), Member of Technical Staff, Bell Tele- 
phone Laboratories, Murray Hill, New Jersey. 

Benvenuto, Andrew A., M.A. (Syracuse University), Graduate Teaching Assistant, Mathe 
matics Department, University of Illinois, 115 Pleasant Street, Hartford, Connecticut. 

Bloemena, A. R., M.E. (Institute of Technology, Delft, Holland), Research Fellow, Mathe 
matical Centre, Amsterdam, Holland, Bolerdiepstraat 54', Amsterdam Z-2, Holland. 

Blumenthal, Saul, B.A. (Central High School of Philadelphia), Student, Cornell University, 
Sibley School of Mechanical Engineering, 313 S. 22nd St., Philadelphia 3, Pennsylvania 

Brennan, D. G., B.S. (Massachusetts Institute of Technology), Staff Member, Lincoln 
Laboratory and Graduate Student, M.1.T., 300 Westgate West, Cambridge 39, Massa 
chusetts. 

Brown, D. M., B.A. (University of Toronte), Graduate Student, University of Toronto, 
73 St. George Street, Toronto, Ontario, Canada. 

Calhoun, Carolyn, M., B.A. (University of Alabama), Graduate Student with Assistantship, 
Psychology Department, University of Alabama, Box 5331, University, Alabama 

Campbell, Louden Lee, B.S. (University of Pittsburgh), District Coordinator, Parke, Davis 
and Company, Department of Clinical Investigation, Joseph Campau at the River, 
Detroit, Michigan. 

Carlson, C. Henry, M.A. (University of Illinois), Research Analyst, Douglas Aircraft 
Company, Santa Monica, California, 866 Haverford, Apt. 10, Pacific Palisades, Cali 
fornia. 

Carpenter, J. A., M.A. (University of North Carolina), Research Engineer, Melpar Inc., 
Boston, Massachusetts, 10 Forest St., Apt. 16, Cambridge 40, Massachusetts 

Carr, Charles, R. M.I.A. (Columbia University), Graduate Student and Teaching Assistant, 
Departments of Statistics and Economics, Stanford University, 2084 Harvard Street, 
Palo Alto, California. 

Chapman, Herman Hollis, Ph.D. (Columbia University), Professor of Business Statistics, 
University of Alabama, University, Alabama. 

Cohen, Leonard, B.S.S. (College of the City of New York), Teaching Lecturer, College of 
the City of New York, 1055 Wheeler Avenue, Bronx 72, New York. 

Collier, Raymond Oliver, Jr., Ph.D. (University of Minnesota), Assistant Professor of Edu 
cation, University of Minnesota, 1884 N.. Ozford St., St. Paul 13, Minn. 

Constantine, Alan Graham, B.S. (University of Western Australia), Research Officer, Divi 
sion of Mathematical Statistics, C.S.I.R.O., 36 Hardy St., Canterbury, N.S.W., Australia. 

Crystal, Eugene, A.B. (William Jewell College), Mathematician in charge of ce: nputing 
laboratory, Textile Research Institute, Princeton, New Jersey, 25 Witherspoon Street, 
Princeton, New Jersey 

Cuttle, Yvonne, G. M. G. (Mrs. P. M.), M.A. (University of British Columbia), Graduate 
Student, University of Oregon, Eugene, Oregon 
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Dalton, Jonas, M., A.B. (George Washington University), Graduate Student, Virginia 
Polytechnic Institute, Blacksburg, Virginia, 204 Roanoke St., Blacksburg 


Virginia. 
Davis, Willis L., B.S. (Howard University), 


Acting Group Leader, Statistical Services, 
tesearch and Advanced Development Division, AVCO Manufacturing Corporation, 
Stratford, Connecticut, 1501 Seaview Avenue, Bridgeport, Connecticut. 

Dick, Ronald S., B.S. (Queen’s College), Assistant in Mathematical Statistics, Columbia 
University, and Mathematical Statistician, U.S. Census Bureau, Washington, D. C., 
84-66 250th St., Bellerose 26, New York. : 

Dresner, A. Joseph, M.A. (New York University), Statistician, Board of Education, City of 
New York 161-30 Jewel Ave., Flushing 65, Long Island, New York. 

Dunn, Olive Jean (Mrs. R. L.), M.A. (University of California at Los Angeles), Teaching 
Assistant, U.C.L.A. School of Business Administration, 404 9th St., Manhattan Beach, 
California 

Eeden, Constance van, dra (University of Amsterdam), Research Fellow, Mathematical 
Centre, Statistical Department, Edisonstraat g!, Amsterdam, Holland. 

Elmaghraby, Salah Eldin, A., M.Sc. (Ohio State University), Research Assistant, Cornell 
University, and Graduate Student, Industrial Engineering, Cornell University, 409 
Eddy St., Ithaca, New York 

Elteren, Ph. van, M.S. (University of Amsterdam), Sub-chief of the Statistical Department, 
Mathematical Centre, Ritzema Bosstraat 38'. Amsterdam O, Netherlands. 

Ferrer Martin, Sebastian, L.C.E. (Universidad de Madrid), Profesor de Técnica del Mues 
treo, Jefe de la Seccién de Metodologia, Madrid, Sagunto, 3.—Madrid, Spain. 

Fetters, William B., B.S. (Indiana University), Analytical Statistician, U.S. Naval Powder 
Factory, Quality Control Department, Indian Head, Maryland, 122 Seneca Drive, 
Forest Heights, Maryland. 

Fossum, R. R., M.S. (University of Oregon), Research Fellow, Department of Mathematics, 
University of Oregon, Electronic Defense Laboratory, P.O. Box 205, Mountain View, 
California. 

Fukuda, Yoichiro, M.A. (University of California at Los Angeles), Research Mathemati 
cian, Engineering Department, University of California at Los Angeles, Los Angeles 24, 
California. 

Gafarian, A. V., B.S. (University of Michigan), Student and Teaching Assistant, Depart 
ment of Mathematics, University of California at Los Angeles, 5386 Dawes Ave., Culver 
City, California 

Gani, J. M., Ph.D. (Australian National University), Nuffield Research Fellow, Statistical 
Laboratory, The University of Manchester, Manchester 13, England. 

Gardiner, Donald A., Ph.D. (North Carolina State College), Member, Mathematics Panel, 
Oak Ridge, National Laboratory, Post Office Box P, Oak Ridge, Tennessee. 

Gentry, Charles J., B.S. (University of Florida), Graduate Student, University of Florida, 
Sox 2003, University Station, University of Florida, Gainesville, Florida. 

Geoghagen, Randolph R. M., M.A. (Columbia University), Research Assistant, Popin 
Cyclotron Laboratory, Columbis University, 319 East 162nd St., Bronx 56, New York. 

Gephart, L. S., Ph.D. (University of } wida), Mathematical Statistician, Design of Experi- 
ment Unit, Research and Devel.,»ment Division, Office, Chief of Ordnance, D. A., 
Washington 25, D. C. 

Gershefski, George William, B.M.E. (General Motors Institute), Student, Cornell Uni- 
versity, 127 College Avenue, Ithaca, New York. 

Gessford, John E., M.S. (Stanford), Graduate Student, Statistics Department, Stanford 
University, 637 Alvarado, Stanford, California. 

Gorman, T. P., M.S. (Fordham College), Senior Programmer, New York Scientific Com- 
puting Center, International Business Machines, 590 Madison Ave., New York, New 
York 
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Groves, John Ellis, Jr., B.S. (Arkansas State College), Graduate Fellow, St. Louis Uni- 
versity, 359 N. Whittier, St. Louis, Missouri. 

Halter, Albert N., Ph.D. (Michigan State University), Graduate Student and Research 
Instructor, Michigan State University, 2276 Hulett Road, Okemos, Michigan 

Haynam, George E., M.S. (Case Institute of Techology), Research Assistant, Project Doan- 
brook, Case Institute of Technology, 10900 Euclid Avenue, Cleveland, Ohio. 

Hemeliryk, F., Dr.Sc. (University of Amsterdam), Professor in Statistics, Technical Uni- 
versity, Delft, and Chief of Statistical Consultation, Mathematical Centre, Amster- 
dam, Weesperzijde 83", Amsterdam (O), Netherlands. 

Hetz, Wolfgang, Ph.D. (University of Géttingen), Adviser in Statistical Problems, Deut- 
she Forschungsgemeinschaft, Calsowstr. 20, Géltingen, Germany. 

Hogben, David, B.A. (University of Minnesota), Inspection Development Engineer, West- 
ern Electric Co., 100 Central Ave., Kearny, New Jersey. 

Howes, David R., A.B. (Amherst), Staff Statistician, Chemical Corps Engineering Com- 
mand, Army Chemical Center, Maryland. 

Ikeda, Hiroshi, B.P. (University of Tokyo), Graduate Student, Division of Research in 
Humanities, University of Tokyo, 845 Izumiché, Suginamiku, Tokyo, Japan. 

Kahn, Paul Markham, B.S. (Stanford), Student, Stanford University, 76 Manzanita Road, 
Fairfax, California. 

Keys, Phillis Allen, B.S. (Wayne University), Computer, Ames Laboratory, Atomic Energy 
Commission, and Graduate Student, Statistics Department, Iowa State College, 510 
Forest Glen, Ames, Iowa. 

Kopka , William E., M.A. (Syracuse University), Teaching Assistant, Mathematics Depart 
ment, Syracuse University, Syracuse 10, New York 

Lechner, J. A., B.S. (Carnegie Institute of Technology), Graduate Student, Princeton Uni 
versity, Fine Hall, Box 708, Princeton, New Jersey. 

Leiman, John M., Ph.D. (University of Washington), Chief, Statistical Methodology and 
Analysis Branch, Air Force Personnel and Training Research Center, Box 1557, Lack 
land Air Force Base, San Antonio, Texas. 

Lenthall, Jerry, B.A. (Swarthmore College), Graduate Assistant, Department of Statistics, 
Stanford University, Box 1431, Stanford, California. 

Lieberman, Alfred, M.A. (University of Southern California), Mathematical Statistician, 
Bureau of Ships, Department of the Navy, Washington 25, D. C., 6019 Strathmore Ave., 
Kensington, Maryland. 

Lindgren, B. W., Ph.D. (University of Minnesota), Mathematics Instructor, University of 
Minnesota, 32 Orlin Avenue S.E., Minneapolis 14, Minnesota. 

McCue, Edmund B., M.S. (University of Michigan), Graduate Assistant, State University 
of Iowa, Iowa City, lowa, 306 East Jefferson St., Iowa City, Iowa. 

McLynn, James M., M.A. (George Washington University), Research Analyst, Army Lo- 
gistics Project GWV, Washington, D. C., 7417 17th Ave., West Hyattsville, Maryland. 

Mackropoulos, C. L., M.A. (University of Illinois), Assistant, Mathematics Department, 
University of Illinois, % Mrs. Mary E. Bourke, 7724 Seuth Marquette, Chicago, Illinois. 

Magness, T. A., M.A. (University of California at Los Angeles), Mathematician, Ramo- 
Wooldridge Corporation, 8820 Bellanca Ave., Los Angeles 45, California, 1042% S. Ser- 
rano Ave., Los Angeles 6, California. b 

Malan, D. J., M.A. (Columbia University), Assistant Actuary, South African National 
Life Assurance Company, Sanlamhof, C.P., Union of South Africa, % SANLAM, San- 
lamhof, C.P., Union of South Africa. 

Matthes, T. K., B.S. (California Institute of Technology), Graduate Student, California 
Institute of Technology, 1201 E. California St., Pasadena 4, California. 

Merrill, Robert B., B.S. (Purdue University), Quality Control Statistician, Whirlpool- 
Seegar Corporation, Estate Division, Hamilton, Ohio, 5727 Marmion Lane, Cincinnati 
13, Ohio. 
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Miller, D. D., B.A. (University of Florida), Graduate Assistant, Department of Economics, 
University of Florida, Box 2798, University Station, Gainesville, Florida. 

Mills, Donald F., M.E. (University of Washington), Graduate Student, University of 
Washington, 5015 16th Ave., N. E., Seattle 6, Washington. 

Mohan, Chandra, M.Sc. (Agra University), Research Scholar, Columbia University, % Mr. 
H. D. Khandelwal, 64 Lanercost Road, London S. W. 2, England. 

Morgenthaler, George W., Ph.D. (University of Chicago), Group Leader, Institute for Air 
Weapons Research, University of Chicago, 3827 Woodside Avenue, Hollywood, Illinois. 

Moursund, Andrew F., Ph.D. (Brown University), Professor and Head of Department of 
Mathematics, University of Oregon, Eugene, Oregon. 

Neuwirth, Lee, M.A. (Columbia University), Lecturer, Columbia College, Box 45, Hamilton 
Hall, Columbia College, Morningside Heights, New York, New York. 

Oliveira, Augusto T., B.A. (Portuguese Agricultural Research Station), Statistician, Portu- 
guese Agricultural Research Station, Estacaé Agronomica Nacional, Saca veur (Lis- 
bon), Portugal, 676 Pammel Court, Ames, Iowa. 

Otsuka, Jun, M.A. (Tokyo University), Student of Doctor Course, Laboratory of Animal 
Anatomy, Faculty of Agriculture, Tokyo University, No. 17, 2-Chome, Kaname-cho, 
Toshima-ku, Tokyo-to, Japan. 

The Parke Mathematical Laboratories, Inc. Applied Mathematicians, has moved from Con- 
cord to larger quarters in the country, Bedford Road, Carlisle, Massachusetts 

Perry, N. C., Ph.D. (University of Southern California), Associate Professor, Alabama 
Polytechnic Institute, 136 Cedar Crest Drive, Auburn, Alabama. 

Rappaport, Erle, M.S. (University of Michigan), Project Statistician, Aeronautical Radio, 
Inc., Washington, D. C., 713 Center Hunting Towers, Alexandria, Virginia. 

Reed, Frank C., B.S. (University of Redlands), Graduate Student, University of California 
at Los Angeles, Westwood, California, 6150 S. Sepulveda, Culver City, California. 

Rodine, Robert H., B.S. (State Teacher’s College, Pennsylvania), Graduate Research As- 
sistant, Statistical Laboratory, Purdue University, Lafayette, Indiana. 

Rogers, Gerald S., M.A. (University of Washington), Half-time Instructor, State Uni- 
versity of Iowa, Iowa City, Iowa. 

Runnenburg, J. Theo, Ph.D. (University of Amsterdam), Co-worker, Statistics Depart- 
ment, Mathematical Centre, Amsterdam, Overtoom 417, Amsterdam (West), Holland. 

Russell, Thomas S., Ph.D. (Virginia Polytechnic Institute), Assistant Professor, Virginia 
Polytechnic Institute, 1415 Burruss Blvd., Blacksburg, Virginia. 

Selig, Seymour M., B.M.E. (Rensselaer Polytechnic Institute), Engineering Statistician, 
Staff Statistician, Chemical Corps Engineering Command, Army Chemical Center, 
Maryland. : 

Seltzer, Frederic, B.S. (City College of New York), Actuarial Student, Metropolitan Life 
Insurance Company, New York City, 23 Kerrigan St., Long Beach, New York. 

Smith, Robert L., M.S. (Virginia Polytechnic Institute), Graduate Student, Virginia 
Polytechnic Institute, University Club, Blacksburg, Virginia. 

Smith, Thaddeus L., M.A. (Columbia University), 625 W. 135th Street, New York, New York. 

Stanley, Julian C., Jr., Ed.D. (Harvard University), Associate Professor of Education, 
University of Wisconsin, 305 S. Owen Drive, Madison, Wisconsin. 

Stinson, Fannie A., M.S. (Howard University), Mathematician, U.S. Navy Hydrographic 
Office, Washington 25, D. C., 3816 10th Street, N. W., Washington 11, D.C. 

Takeuchi, Kei, B.E. (Tokyo University), Graduate Student, Tokyo University, 4-652 
Maba shi, Suginami-ku, Tokyo, Japan. 

Testerman, Jack., B.A. (Oklahoma A. and M. College), Graduate Assistant, Oklahoma A. 
and M. College, Stillwater, Oklahoma, 115 E. Boeing Drive, Midwest City, Oklahoma. 

Thomas, Earl A., B.S. (Columbia University), Senior Reliability Analyst, Research and 
Advanced Development Division, AVCO Manufacturing Corporation. Stratford, Con- 
necticut, 54 Ocean Avenue, Lordship, Connecticut. 
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Thomas, G. B., Jr., Ph.D. (Cornell University), Associate Professor and Executive Officer, 
Department of Mathematics, Massachusetts Institute of Technology, Cambridge 39, 
Massachusetts. 

Urbanik, John G., A.B. (University of Rochester), Systems Engineer, Republic Aviation 
Corporation, Farmingdale, New York, 108 Rockville Centre Parkway, Oceanside, New 
York. 

Ury, Hans K., A.B. (University of California), Student and Research Assistant, Department 
of Statistics, University of California, Berkeley, California. 

Ward, Joe H., Jr., Ph.D. (University of Texas), Research Psychologist, Air Force Personnel 
and Training Research Center, Personnel Research Laboratory, Lackland Air Force 
Base, San Antonio, 315 Palm Drive, San Antonio, Texas. 

Wiesen, J. M., M.S. (Iowa State College), Supervisor, Statistical Division, Sandia Corpora 
tion, Albuquerque, 1609 Cagua Drive, N. E., Albuquerque, New Mexico. 

Yonezawa, Shingo, B.E. (University of Tokyo), Graduate Student, Department of Applied 
Physics, Faculty of Engineering, University of Tokyo, Hongo, Tokyo, Japan 

Yukihide, Okano, B.E. (Tokyo University), Graduate Student, Tokyo University, &6 
Otsuka-machi Bunkyo-ku, Tokyo, Japan. 

Zetterberg, Lars-Henning, L.T. (Royal Institute of Technology, Stockholm), Research 
Engineer, Research Institute of National Defense, Stockholm, Sweden, 5636 Dor 
chester Ave., Chicago 37, Illinois. 

Zoellner, J. Arthur, M.S. (Iowa State College), Experiment Design and Analysis Statis 
tician, General Engineering Laboratory, General Electric Company, Schenectady, 
New York. 

Zolczynski, Stephen J., B.S. (University of Alabama), Tabulation Project Planner, Officer 
Education Research Laboratory, AFPTRC, Maxwell Air Force Base, Alabama, 2228 
St. Charles Avenue, Montgomery, Alabama 

Zoroa Terol, Procopio, D.C.M. (Universidad de Madrid), Profesor adjunto de Estadistica 
Matematica, Universidad de Madrid, Avda. Aureliano Ibarra, 1, Alicante, Spain 


rr 


Preparation of Concise Tables for Statisticians Now Underway 


A Team of statisticians and computers is working on a research project or- 
ganized and directed by Dr. K. C. 8S. Pillai, UN Senior Statistical Advisor to 
the Statistical Center, University of the Philippines. The project involves the 
preparation of some new statistical tables which will be useful for tests of hy- 
potheses in multivariate analysis. These tables will facilitate tests of equalite 


between variate means of several multivariate populations. They will also by 


useful in tests of equality of dispersion matrices in two multivariate populations, 
and of the independence of two sets of variates which follow the multivariate 
normal law. It is hoped that these tables may be completed shortly to render 
them available to statisticians within the current year. These tables, along 
with some others used for tests dealing with univariate problems, will be issued 
by the Statistical Center in a report edited by Dr. Pillali and entitled Concise 
Tables for Statisticians. 


rr 
Postdoctoral Study in Statistics 


Awards for study in statistics by persons whose primary field is not statistics 
but one of the physical, biological, or social sciences to which statistics can be 
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applied are offered by the Committee on Statistics of the University of Chicago. 
The awards range from $3,600 to $5,000 on the basis of an eleven month resi- 
dence. The closing date for application for the academic year 1957-8 is February 
15, 1957. Further information may be obtained from the Committee on Statistics, 
Eckhart Hall, University of Chicago, Chicago 37, Illinois. 


rr 


Preliminary Actuarial Examinations Prize Awards 

The winners of the prize awards offered by the Society of Actuaries to the 
nine undergraduates ranking highest on the score of Part 2 of the 1956 Pre- 
liminary Actuarial Examination are as follows: 
First Prize of $200 

Pratt, Richard L. Washington University 
Additional Prizes of $100 each 

Brillinger, David R. University of Toronto 

Earle, Clifford J., Jr. Swarthmore College 

Kaplan, Stanley Cornell University 

Mosher, Robert E. Kenyon College 

Riehm, Carl R. University of Toronto 

Rubin, Jerrold Columbia University 

Schweitzer, Paul A. Holy Cross College 

Soderquist, George D. Drake University 

The Society of Actuaries has authorized a similar set of nine prizes for the 
1957 examinations on Part 2. 

The Preliminary Actuarial Examinations consist of the following three 
examinations: 


Part 1. Language Aptitude Examination. 


(Reading comprehension, meaning of words and word relationships, 


antonyms, and verbal reasoning.) 

General Mathematics Examination. 

(Algebra, trigonometry, coordinate geometry, differential and integral 
calculus.) 

Special Mathematics Examination. 

(Finite differences, probability and statistics.) 

The 1957 Preliminary Actuarial Examinations will be prepared by the Edu- 
cational Testing Service under the direction of a committee of actuaries and 
mathematicians and will be administered by the Society of Actuaries at centers 
throughout the United States and Canada on May 15, 1957. The closing date for 
applications is April 1, 1957. 

The Society of Actuaries 
208 South LaSalle Street 
Chicago 4, Llinois 
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Quality Control and Applied Statistics Abstract Service 

Interscience Publishers, Inc. announces the inauguration of QUALITY 
CONTROL AND APPLIED STATISTICS ABSTRACTS, a monthly loose-leaf 
abstract service covering the world literature on Quality Control, Operations Re- 
search and Industrial Applications of Statistical Methods of all kinds. More 
than 400 journals will be scanned, for articles that present new information in 
the field, and the abstracts will be sufficient’ comprehensive to show the signifi- 
cant contribution of each article, so that it will usually be unnecessary to consult 
the original paper. It will consist of one volume of about 1,000 pages yearly, 
divided among 12 issues, beginning in June 1956. The subscription price is 
$60.00 per volume. 


$$ a 


Research Fellowships in Psychometrics 

Princeton, N. J.: The Educational Testing Service is offering for 1957-58 its 
tenth series of research fellowships in psychometrics leading to the Ph.D. degree 
at Princeton University. Open to men who are acceptable to the Graduate School 
of the University, the two fellowships each carry a stipend of $2,500 a year and 
are normally renewable. Fellows will be engaged in part-time research in the 
general area of psychological measurement at the offices of the Educational 
Testing Service and will, in addition, carry a normal program of studies in the 
Graduate School. Suitable undergraduate preparation may consist either of a 
major in psychology with supporting work in mathematics, or a major in mathe- 
matics together with some work in psychology. However, in choosing fellows, 
primary emphasis is given to superior scholastic attainment and demonstrated 
research ability rather than to specific course preparation. The closing date for 
completing applications is January 4, 1957. Information and application blanks 
will be available about October 1 and may be obtained from: Director of Psy- 
chometric Fellowship Program, Educational Testing Service, 20 Nassau Street, 
Princeton, New Jersey. 


ar 


Summer Job Opportunities at the National Bureau of Standards 


The Junior Scientist-Engineer program, open primarily to sophomores and 
juniors, is designed to prepare especially well qualified students majoring in the 
physical sciences, mathematics and engineering, for a future professional career 
at the National Bureau of Standards. Approximately 160 college students partici- 
pate in this summer program each year with approximately 40 universities repre- 
sented in the group. The program is a work-study plan. Students who meet pro- 
gram requirements are carried on a “leave without pay” status during the school 
year. With the exception of the GS-1 level, selections are made from Civil Service 
registers which are established as a result of the Student Trainee examination. 
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The GS-1 group consists of selected high school graduates who have distinguished 
themselves in the physical sciences and engineering via the Westinghouse Science 
Talent Search or Science Honors on a national basis. The students participate 
in a planned program of orientation carefully supervised on-the-job training 
assignments, and discussions with trainee advisors who are appointed from each 
technical division. 

Students interested in this program should watch for the Civil Service Com- 
mission’s Student Trainee examination announcement generally posted on 
college bulletin boards sometime during fall or early winter. 


oo 


NRC-NBS Research Associateships 


Research associateships, supported by the National Bureau of Standards and 
awarded on recommendations of the National Academy of Sciences—National 
Research Council, are offered to provide young investigators of unusual promise 
and ability the opportunity for basic research in various branches of the physical 
and mathematical sciences. These associateships are open only to citizens of the 
United States and are tenable at the National Bureau of Standards in Washing- 
ton, D. C. Applicants must have the Ph.D. or Sc.D. degree, or their equivalent. 
The term of the appointment is for one calendar year. It is expected that ap- 
proximately 10 awards may be made in a total of fourteen fields, of which the 
following are of particular interest to mathematicians: Pure and Applied Mathe- 
matics; Applied Mathematical Statistics; Numerical Analysis; Statistical Me- 
chanics. Awards will be made about April 1, 1957. Appointments will be for one 
year. The annual gross stipend will be $7035 and will be subject to income tax. 
Requests for application forms and for additional information about require- 
ments for applications should be addressed to the Fellowship Office, National 
Academy of Sciences—National Research Council, 2101 Constitution Avenue, 
N.W., Washington 25, D. C. Applications for the academic year 1957-1958 must 
be received in the Fellowship Office no later than January 11, 1987. 


a 


International Travel Announcement 


The National Science Foundation will award individual grants to defray 
partial travel expenses for a limited number of American scientists participating 
in the following international congresses: 30th Session of the International 
Statistical Institute; Congress of the International Union of the Scientific Study 
of Population. These congresses are scheduled to meet in Stockholm, Sweden, 
August 8 to 15, 1957. Application blanks may be obtained from the National 
Science Foundation, Washington 25, D. C. Completed application forms must be 
submitted by March 1, 1957. 
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REPORT OF THE DETROIT MEETING OF THE INSTITUTE OF 
MATHEMATICAL STATISTICS 


The seventy-second meeting of the Institute of Mathematical Statistics, a 
Central Regional Meeting, was held at the Hotels Sheraton-Cadillac and Statler, 
Detroit, Michigan, on September 7-9, 1956. The meeting was in conjunction 
with meetings of the American Statistical Association, the American Sociological 
Society, the Econometric Society, the Biometric Society, the Federation of 
Financia} Analysts Societies, the Society for the Study of Social Problems, and 
the Rural Sociological Society. 


A Special Invited Paper, Randomization and Industrial Experimentation, 
was presented by Dr. W. J. Youden, National Bureau of Standards. 
The following members of the Institute attended the Detroit meeting: 


Om P. Aggerwal, Wm. R. Allen, R. L. Anderson, Theo. W. Anderson, Jr., Virgil L. Ander- 
son, Wm. B. Anderson, Kenneth J. Arnold, Wm. Dowell Batten, Z. W. Birnbaum, Chester 
I. Bliss, Colin Ross Blyth, Helen Bozivich, Ralph Allan Bradley, Alva Esmond Brandt, 
Glenn W. Brier, Harold F. Bright, Irwin D.8. Bross, Byron Wm. Brown, Robert W. Burgess, 
Irving W. Burr, Elizabeth Dean Bushell, Joseph M. Cameron, Mavis B. Carroll, Victor 
Chew, Williard H. Clatworthy, Alonzo C. Cohen, Samuel E. Cohen, Wm. Stokes Connor, 
Louis J. Cote, Dudley J. Cowden, Gertrude M. Cox, Cecil C. Craig, Joseph F. Daly, Herbert 
T. David, Besse B. Day, Claude DeCourval, Daniel B. DeLury, Francis R. Del Priore, 
Lucile Derrick, Stuart C. Dodd, James L. Dorby, Acheson J. Duncan, Chas. W. Dunnett, 
Paul 8. Dwyer, Marjorie Easterbrook, Churchill Eisenhart, Benjamin Epstein, Chas. F. 
Federspiel, Wm. Brooke Fetters, Lester R. Frankel, Spencer M. Free, Jr., Fred Frishman, 
Donald A. Gardiner, Norman R. Garner, Seymour Geisser, Dorothy Morrow Gilford, Leo 
A. Goodman, Samuel W. Greenhouse, Joseph A. Greenwood, Lee Gunlogson, Paul Gunther, 
Keet W. Halbert, Max Halperin, Albert N. Halter, Morris H. Hansen, Boyd Harshbarger, 
H. Leon Harter, Herman O. Hartley, Wm. C. Healy, Jr., F. M. Hemphill, Leon H. Herbach, 
Irene Hess, Clifford G. Hildreth, Robt. G. Hoffman, Robert Hooke, Wm. H. Horton, Daniel 
G. Horvitz, Harold Hotelling, Earl E. Houseman, Hendrik Houthakker, Cyril C. Hoyt, 
Paul E. Irick, J. Edward Jackson, Palmer O. Johnson, Howard L. Jones, Lawrence F. Jones, 
Hyman B. Kaitz, Edward L. Kaplan, Marvin Kastenbaum, Leo Katz, Harriet J. Kelley, 
Lester S. Kellogg, Oscar Kempthorne, Robt. W. Kennard, George H. Kennedy, Allyn W. 
Kimball, Arnold J. King, Edward P. King, Calvin J. Kirchen, Leslie Kish, Carl F. Kossack, 
Robt. M. Kozelka, Clyde Y. Kramer, Wm. H. Kruskal, T. T. Kwo, Donald E. Lamphiear, 
James F. Lanahan, Andre G. Laurent, Alfred Lieberman, Morris M. Lightstone, Rensis 
Likert, Benjamin Lipstein, Geo. F. Lunger, Albert Madansky, G. L. Marcus, Eli S. Marks, 
Robt. H. Matthias, Paul Meier, Dale M. Mesner, Irwin Miller, Albert Mindlin, Joseph E. 
Morton, Jack Moshman, Hugo Muench, Mervin E. Muller, John Neter, Monroe L. Norden, 
Horace W. Norton, James A. Norton, Edwin G. Olds, Ingram Olkin, Paul 8. Olmstead, 
Thos. M. Oneson, Bernard Ostle, Donald B. Owen, Wm. R. Pabst, Nancy S. Parker, John 
F. Pauls, Kan-Chen Peng, Eugene W. Pike, James H. Powell, John W. Pratt, Bruce P. Price, 
Donald L. Richter, David Rubenstein, Daniel E. Sands, F. E. Satterthwaite, Edward Sax, 
Marvin A. Schneiderman, Norman C. Severo, Richard H. Shaw, Walt R. Simmons, Morris 
Skibinsky, H. Fairfield Smith, Paul N. Somerville, Frederick F. Stephan, John N. Stewart, 
Fred L. Strodtbeck, Seiji Sugihara, Zenon Szatrowski, Daniel Teichroew, James G. C. 
Templeton, Benjamin J. Tepping, Milton E. Terry, Donovan J. Thompson, Wm. A. Thomp 
son, Geo. Wm. Thomson, Leo J. Tick, Chia Kuei Tsao, John W. Tukey, Malcolm E. Turner, 
Elizabeth Vaughan, Joseph Waksberg, Louis Weitter, Alfred G. Whitney, John M. Wiesen, 
Gregory P. Williams, R. Lowell Wine, Gerald Winston, Wm. W. Wolman, Max A. Woodbury, 
Wm. J. Youden, Marvin Zelen, John A. Zoellner 
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The program of the Detroit meeting follows: 


FRIDAY, SEPTEMBER 7, 1956 


10:00 a.m. Covariance Analysis, I. Cosponsored with the American Statistical 
Association and the Biometric Society. 


Chairman: A. W. Kimpati, Oak Ridge National Laboratory. 


Papers: Elements of Covariance, D. B. DeELuRy, Ontario Research Foundation 
Interpretations of Regressions in Analysis of Covariance, H. FarrrreLp Smits, 
North Carolina State College. 


Discussion: Oscar KEMPTHORNE, Iowa State College. 


1:00 p.m. Contributed Papers. 


Chairman: James LANAHAN, University of Detroit 
Papers: 1. Some Results on the Distribution of the Peaks of a Gaussian Process, Inw1N 
MILLER AND Joun E. Freunp, Virginia Polytechnic Institute. 
2. Unbiased Estimation of the Normal Distribution Function. (Preliminary report), 
Wivii1aM C. Heaty, Jr., Ethyl Corporation Research Laboratories, Ferndale, 
Michigan. 
3. Unbiased Estimation of Correlation Coefficients, INGRAM OLKIN AND JOHN W. 
Pratt, University of Chicago. (By title) 
. On a Multivariate Tchebycheff Inequality. (Preliminary report), INGRAM OLKIN 
AND JouHNn W. Pratt, University of Chicago. (By title) 
A Continuous Time Treatment of the Waiting-lime in a Queueing System Having 
Poisson Arrivals, a General Distribution of Service-time, and a Single Service 
Unit. (Preliminary report), VacLav Epvarp BENgS, Bell Telephone Labora- 
tories, Murray Hill, New Jersey. (By title) 
. Some Results on the Analysis of Random Signals by Means of a Cut-counting 
Process, IRw1N MILLER AND JOHN E. FrEvUND, Virginia Polytechnic Institute. 
(By title) 
. A New Class of Partially Balanced Incomplete Block Designs, DALE M. MESNER, 
Purdue University and Michigan State University. 
. Generalization of Thompson’s Distribution, ANDRE G. LAURENT, Michigan State 
University. 
. Some Results for Inverting Patterned Matrices, A. E. SARHAN AND B. G. GREEN- 
BERG, University of North Carolina. (By title) 
. On the Solution of the Functional Equation of Farrell’s Market, A. CHARNES AND 
O. P. AGGARWAL, Purdue University. (By title) 
2:00 p.m. Covariance Analysis, II. Cosponsored with the American Statistical 
Association and the Biometric Society. 


Chairman: GertrupE M. Cox, North Carolina State College. 


Papers: Covariance Analysis with Unequal Subclass Numbers, WALTER T. FEDERER, Cornell 
University. 


The Analysis of Covariance for Incomplete Block Designs, MARVIN ZELEN, Na- 
tional Bureau of Standards. 
Group Comparisons and Analysis of Variance and Covariance in Cluster Sam- 
pling, H. O. Hartiey, Iowa State College 
Discussion: Joun W. Tuxey, Princeton University. 
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4:00 p.m. Applications of Electronic Computers in Statistics. Cosponsored with 
the American Statistical Association and the Biometric Society. 


Chairman: M. A. Woopsury, New York University. 
Papers: Experiences with SEAC, J. M. Cameron, National Bureau of Standards. 
How to Control the Digital Computer, R. W. Hammina, Bell Telephone Laboratories. 
Sampling Experiments, D. TEICHROEW, National Cash Register Company. 
Discussion: ZENON SzaTROWSKI, International Business Machines Corporation, New York. 


8:00 p.m. Theoretical Aspects of Sample Surveys. Cosponsored with the Ameri- 
can Statistical Association. 


Chairman: Howarp L. Jongs, Illinois Bell Telephone Company. 
Papers: Estimation of Variances and Composite Estimation Procedures, JoseErpH WAKSBERG, 
Bureau of the Census. 
Unsolved Problems in the Statistics of Survey Sampling, Les.ie Kisn, University of 
Michigan. 
Unbiased Ratio Estimators and Their Variances, Leo GoopMaNn, University of 
Chicago, AND H. O. Hart ey, Iowa State College. 
Discussion: D. Horvitz, North Carolina State College. ALLEN Ross, State University of 
New York. 


SATURDAY, SEPTEMBER 8, 1956 


9:00 a.m. Critical Problems in New Quantitative Techniques. Cosponsored with 
the American Statistical Association and the American Sociological 
Society. 


Chairman: Leo A. GoopMan, University of Chicago. 

Papers: Stochastic Models and their Applications to Social Phenomena, JERZY NEYMAN, Uni 
versity of California, AND W1Ltu1aAM KruskaL, University of California and 
University of Chicago. 

Measurement and Sampling in Social Research, FreEpERIcK F. STEPHAN, Princeton 
University. 
Discussion: Paut F. Lazarsretp, Columbia University. 


11:00 a.m. Special Invited Paper. 


Chairman: Witt1aAM Kruskat, University of Chicago. 
Address: Randomization and Industrial Experimentation, W. J. YoupEN, National Bureau 
of Standards. 


2:00 p.m. Invited Papers on Mathematical Statistics, I. 


Chairman: Leo A. GoopMan, University of Chicago. 
Papers: The Use of Sample Spacing in Tests of Fit, LionEL Wetss, University of Virginia 
and University of Oregon. 
A Generalization of Internal Regression for the Fitting of Some Non-linear Models, 
R. WHITE AND O. KemMpTHORNE, Iowa State College. 
On the Relative Position of the Mean and Order Statistics, H. T. Daviv, University 
of Chicago and Iowa State College. 
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SUNDAY, SEPTEMBER 9, 1956 


10:00 a.m. Applications of Stochastic Processes. Cosponsored with the American 
Statistical Association and the Biometric Society. 


Chairman: Oscar KEmMpTHORNE, Iowa State College. 
Papers: The After-History of Pulmonary Tuberculosis, A Stochastic Model, Davin W. ALLING 
Herman W. Biggs Memorial Hospital. 
The Application of Stochastic Processes to the Kinetics of Enzyme Action, ANTHONY 


F. BartTHo.tomay, Harvard School of Public Health and Harvard University 
Medical School. 


2:00 p.m. Wishart Memorial Session. Cosponsored with the American Statistical 
Association and the Biometric Society. 


Chairman: GERTRUDE Cox, North Carolina State College. 


Address: Contributions of John Wishart to Statistics, HAarot>D HoTe.iinG, University of 


North Carolina. 


4:00 p.m. Invited Papers on Mathematical Standards, II. 

Chairman: Louis J. Core, Syracuse University. 

Papers: Decision Theory for Polya Type Distributions, Joan Pratt, University of Chicago. 
Bayes Two-stage Decision Rules, Morris Sxrpinsky, Purdue University. 


The Chairman of the Program Committee for the meeting was Stanley Isaacson 


Des Moines, lowa. The Assistant Secretary for the meeting was James Lanahan, 
University of Detroit. 
WILLIAM KrvUsKAL 
Associate Secretary 


or 


REPORT OF THE SEATTLE MEETING OF THE INSTITUTE 


The seventy-first meeting of the Institute of Mathematical Statistics and the 
nineteenth annual meeting was held at the University of Washington, Seattle, 
Washington on August 21-24, 1956 in conjunction with the national annual 
meeting of the Biometric Society, the American Mathematical Society, the 
Mathematical Association of America, and the Econometric Society. A number 
of the sessions were joint (and are so designated) with these organizations. All 
meetings were held on the University of Washington campus. The following 134 
members of the Institute attended: 


I. J. Abrams, M.S. Ahmed, H. L. Alder, Stephen Allen, C. B. Allendoerfer, A. G. Ander- 
son, F. C. Andrews, K. J. Arnold, R. E. Barlow, C. B. Bell, Z. W. Birnbaum, David Black- 
well, J. R. Blum, R. C. Bose, A. H. Bowker, J. V. Breakwell, H. D. Brunk, K. A. Bush, 
L. D. Calvin, D. G. Chapman, Herman Chernoff, W. S. Connor, D. R. Cox, J. H. Curtiss, 
Ivonne Cuttle, D. A. Darling, W. J. Dixon, P. J. Doyle, J. A. Dudman, Churchill Eisenhart, 
Benjamin Epstein, H. P. Evans, T. 8. Ferguson, Martin Fox, J. 8S. Frame, R. S. Gardner, 
D. W. Gaylor, H. M. Gehman, Dorothy Morrow Gilford, W. A. Golomski, F. A. Graybill, 
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Geoffrey Gregory, John Gurland, Donald Guthrie, Bernard Harris, T. E. Harris, L. L. 
Helms, P. G. Hoel, R. V. Hogg, Harold Hotelling, A. 8S. Householder, M. Iqbal, Walter 
Jacobs, John L. Jaech, P. W. M. John, H. L. Jones, Samuel Karlin, E. 8. Keeping, O. M. 
Klose, C. F. Kossack, C. H. Kraft, William Kruskal, G. M. Kuznets, R. B. Leipnik, Jerome 
C. R. Li, G. J. Lieberman, 8. P. Lloyd, A. T. Lonseth, F. W. Lott, R. C. McCarty, W. G. 
Madow, Frank Massey, N. U. Mayall, P. L. Meyer, D. F. Mills, Alex Mood, R. A. Moore, 
Lincoln Moses, Stanley W. Nash, J. Neyman, G. E. Nicholson, Jr., D. B. Owen, Mohan 
Pavate, M. P. Peisakoff, R. S. Pinkham, G. B. Price, Ronald Pyke, Howard Raiffa, P. H. 
Randolph, R. R. Read, G. J. Resnikoff, D. L. Richter, Gerald Rogers, 8. N. Roy. Herman 
Rubin, R. C. Schneider, Lorraine Schwartz, Elizabeth L. Scott, Franklin Sheehan, M. M 
Siddiqui, W. L. Smith, G. P. Steck, Charles Stein, Rothwell Stephens, A. D. Stewart, 
R. F. Tate, W. F. Taylor, Henry Teicher, E. A. Thomas, G. B. Thomas, Jr., F. H. Tingey 
F. H. Trinkl, D. R. Truax, J. R. Vatnsdal, Elizabeth Vaughan, R. E. Walpole, J. E. Walsh, 
L. H. Wegner, J. G. Wendel, Oscar Wesler, Kathleen White, Zivia S. Wurtele, R. K. Zeigler. 


The program of the meeting was as follows: 


TUESDAY, AUGUST 21, 1956 
10:00 a.m. Invited Papers I. 


Place: Room 320, Physics Hall 
Chairman: A. M. Moon, General Analysis Corporation 
1. The Asymptotic Attainment of Bayes Risk, Davin BLACKWELL, University 
of California, Berkeley. 
2. Some Problems in Asymptotic Theory, Lucten Le Cam, University of Cali- 
fornia, Berkeley. 


11:30 a.m. Special Invited Address. 


Place: Room 320, Physics Hall 

Chairman: Wrii1amM Kruskat., University of California, Berkeley 
Asymptotic Theory of Kolmogorov, Smirnov, and von Mises Type Statistics, 
Donaup A. DARLING, University of Michigan. 


2:00 p.m. American Mathematical Society Colloquium Lecture. 


Place: Health Sciences Auditorium 
Speaker: Satomon Bocunenr, Princeton University 
Title: Harmonic Analysis and Probability. 


3:00 p.m. Applications to Physical Sciences. 


Place: Room 320, Physics Hall 
Chairman: Harotp Hore.uinea, University of North Carolina, Chapel Hill 
1. Problem of Rotation of Galaxies of Different Types—Statistical Aspects, N.U 
Maya 1, Lick Observatory. 
2. Internal Motions in Gaseous Masses of Cosmical Dimensions, Guipo MuNCcH 
and O. C. WiLson, Mount Wilson and Mount Palomar Observatories, and 
California Institute of Technology. 
3. Review of Certain Astronomical Problems and Their Statistical Treatments, 
J. NeyMaNn and Exizasetu L. Scortr, University of California, Berkeley. 
4. Use of the r® Brightest Star in a Galary as a Distance Indicator, MANDAKINI 
Sane, University of California, Berkeley. 
Distribution of the Number of Droplets in Unit Lengths of a Track of a Cosmic 
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Ray Particle in a Cloud Chamber, Ropert Reap, University of California, 
Berkeley. 

). Effect of Expansion of the Universe on the Serial Correlation of Counts of 
Images of Galaxies in Regularly Spaced Squares—A Simplified Model, Martin 
Fox, University of California, Berkeley. 


WEDNESDAY, AUGUST 22, 1956 
9:00 a.m. Invited Papers II. 


Place: Room 320, Physics Hall 
Chairman: Mrs. Bernice Brown, The RAND Corporation 
1. Confidence Regions for Dependent Regression, Pau G. HogE., University of 
California, Los Angeles. 
) 


2. Functional Relationships with All Variables Subject to Error, JouN GURLAND, 
lowa State College. 


9:00 a.m. Statistical Problems in Medicine and Biology. (Joint with the 
Biometric Society) 


Place: Room 334, Physics Hall 
Chairman: W. Taytor, School of Aviation Medicine, Randolph Air Force Base 
1. Some Nonparametric Techniques, L. Moses, Stanford University, Stanford. 
2. An Investigation of the Log Transformation of Growth Data, W. BecxEr, Uni- 
versity of California, Berkeley, and Western Washington Experimental 
Station, Puyallup. 
3. Reaction Rates in Geometrically Constrained Enzyme Systems, D. JENDEN, 
Naval Medical Research Institute, Bethesda, and University of California 
at Los Angeles. 


10:30 a.m. Invited Papers III. 


Place: Room 320, Physics Hall 
Chairman: BENJAMIN Epste1n, Wayne University and Stanford University 
1. Law of Small Numbers, W1LLt1amM KruskKA., University of California, Berke- 
ley. 
2. Optimal Multivariate Tests, CHARLES STEIN, Stanford University. 
3. What Judgments are Sufficient for Statistics? I. J. Goop, Cheltenham, Eng- 
land. 


1:30 p.m. Biometric Society Special Invited Address. 


Place: Room 320, Physics Hall 

Chairman: J. NEYMAN, University of California, Berkeley 
Models and General Mathematical Principles in Biology and Sociology, N. Ra 
SHEVSKY, University of Chicago. 


1:45 p.m. Mathematical Problems in Incomplete Block Designs. 


Place: Room 334, Physics Hall 
Chairman: Burton W. Jones, University of Colorado 
1. Recent Advances in Partially Balanced Designs, R. C. Bost, University of 
North Carolina, Chapel Hill. 
2. Symmetrical Balanced Designs, H. J. Ryser, Ohio State University. 
Discussant: W. J. Connor, National Bureau of Standards. 
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8:00 p.m. Council Meeting. 
Place: Room 320, Physics Hall 


THURSDAY, AUGUST 23, 1956 
9:00 a.m. Prediction Problems. (Joint with Biometric Society) 


Place: Room 320, Physics Hall 
Chairman: Davip BLACKWELL, University of California, Berkeley 
1. New Light on the Multiple Correlation Coefficient, HaroLp HoTE.uinG, Uni 
versity of North Carolina, Chapel Hill. 
2. Optimal Estimates of Multiple Criteria with Restrictions on the Covariance 
Matrix of Estimated Criteria, Paut Horst, University of Washington. 
3. Procedural Considerations in Forecasting Populations, V. A. M1LuER, Uni- 
versity of Washington. 


10:00 a.m. Inventory Policy and Dynamic Programming. (Joint with Econ- 
emetric Society) 


Place: Room 314, Physics Hall 
Chairman: GEorGE Dantzic, The RAND Corporation 
1. A Note on the Optimal Character of the (s, 8S) Policy in the Inventory Problem, 
Jack ABRAMS, University of California, Berkeley. 
2. Optimal Sequential Search Problems, SELMER JoHNSON, The RAND Corpora- 
tion. 
3. Ordering Policy for Poisson Determined Supply and Demand, 8. ALLEN and 
G. Feeney, Stanford Research Institute. 
. The Min-Mar Solution of a One-stage Inventory Problem (20 minutes), HER 
BERT ScarF, The RAND Corporation. 


11:00 a.m. Invited Papers IV. 


Place: Room 334, Physics Hall 
Chairman: W. J. Drxon, University of California, Los Angeles 
1. The Sequential Item Selection Problem in Classification Studies—The Case 
of Dichotomous Variables. Howarp Ratrra, Center for Advanced Study in 
the Behavioral Sciences. 
. On the Use of Concomitant Variables in the Selection of an Experimental De 
sign, D. R. Cox, University of Cambridge and University of North Carolina, 
Chapel Hill. 


2:00 p.m. Invited Papers V. 


Place: Room 320, Physics Hall 
Chairman: D. R. Cox, University of Cambridge and University of North Carolina, Chapel 
Hill 
1. Transient Queue Phenomena, WALTER L. Situ, University of North Caro- 
lina, Chapel Hill. 
2. Some Queueing Statistics, EpGArR Retcu, The RAND Corporation and Uni- 
versity of Minnesota. 
3. Some Models of Birth and Death Processes—Linear Growth and Queueing 
Problems, SamueL Karun, Stanford University, and James McGrecor, 
California Institute of Technology. 
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NEWS AND NOTICES 


Contributed Papers I. 


Place: Room 320, Physics Hall 
Chairman: C. H. Krart, University of California, Berkeley 


a 


Efficient Small Sample Nonparametric Median Tests with Bounded Significance 

Levels, Joun E. Wausu, Lockheed Aircraft Corporation. 

. Bayes Approach to Control of Fraction Defective, Joun V. BREAKWELL, North 
American Aviation, Inc. 

. The Distribution of the Extreme Mahalanobis’ Distance from Sample Mean 
(Preliminary Report), YVONNE G. M. G. (Mrs. P. M.) Currie, University 
of British Columbia, (introduced by S. W. Nash). 

. On the Moments of Order Statistics from a Normal Popuiution, R. C. Bose 
and Suanti 8. Gupta, University of North Carolina, Chapel Hill. 

. A Comparison of the Power Curves of Some Double Sample Tesis, Dona.p B. 
OwEN, Sandia Corporation. 

. On Some Nonparametric C-sample Tesis, Frep C.-ANDREWS, University of 
Nebraska. 

. An Asymptotically Distribution-free Multiple Comparison Method with Ap- 

plication to the Problem of r Rankings of m Objects, IRENE RoSENTHAL and 

Tuomas S. Fercuson, University of California, Berkeley. 


8. Some Asymptotic Results on Wald’s Approximate Classification Statistic, 


4:00 p.m. 


Place: Room 
Chairman: I 


i. 
2. 


M. Iqpat, University of North Carolina, Chapel Hill. 

. On the Studentized Largest and Smallest Chi-squared; K. V. RAMACHANDRAN, 
University of Baroda, India. (By Title) 

. On the Distribution of Ranks and of Certain Rank Order Statistics, PRorESsOR 
MeyYerR Dwass, Northwestern University and Stanford University. (By 
Title) 

. Contributions to Distribution-free Population Comparisons, Wi1LL1am E. 
PERRAULT and Waupo A. VezeEau, St. Louis University. (By Title) 

. Validity of Approximate Normality Values for up + ko Areas of Practical Type 
Continuous Populations, Joan E. Wausxu, Lockheed Aircraft Corporation. 
(By Title) 

. Maximum Likelihood Estimation of Restricted Parameters (Preliminary Re- 
port), H. D. Brunx, University of Missouri. (By Title) 

. Confidence Intervals for the Number of Cells in a Multinomial Population 
with Equal Cell Probabilities, BERNARD Harris, Stanford University and 
Department of Defense. (By Title) 


Contributed Papers II. (Joint with American Mathematical Society) 


131, Bagley Hall 
. J. Goon, Cheltenham, England 

The Quadratic Birth Process, Peter W. M. Joun, University of New Mexico. 
On a Uniqueness Property not Enjoyed by the Normal Distribution, GEORGE 
P. Steck, Sandia Corporation. 


3. Moment Generating Functions of Quadratic Forms in Serially Correlated 


Normal Variables, R. B. Lerpnix, University of Washington. 
. Solution of a Ranking Problem from Paired Comparisons, L. R. Forp, JR., 
The RAND Corporation. 


5. Coincidence Probabilities (Preliminary Report), Samvet Karin, Stanford 


University, and J. L. McGrecor, California Institute of Technology. 
. A Mean Martingale Convergence Theorem, L. L. Hetms, Convair. 
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7. Stochastic Convergence of Semimartingales, KLAUS KRICKEBERG, University 
of Wisconsin. 


8. A General Convergence Theorem for Sequences of Stochastic Processes, E. G. 
Kriume_E, Oregon State College. (By Title) 

9. Almost Sure Everywhere Divergence of Random Series, ARYEH DvoRETZkY, 
Hebrew University, Jerusalem, and Columbia University. 


6:00 p.m. Business Meeting. 
Place: Room 320, Physics Hall 


8:00 p.m. Council Meeting. 
Place: Room 320, Physics Hall 


FRIDAY, AUGUST 24, 1956 
9:00 a.m. Invited Papers VI. 
Place: Room 320, Physics Hall 
Chairman: Leo Katz, Michigan State University 
1. Law of Small Numbers, W1Lut1aAM KruskAat, University of California, Berke- 
ley. 
. Some Non-parametric Tests for Independence, Jutius R. Bium, Indiana Uni 
versity. 
3. Some Non-parametric Generalizations of Analysis of Variance and Multi- 
variate Analysis, 8. N. Roy, University of North Carolina, Chapel Hill. 


2 


9:00 a.m. Stochastic Population Problems. (Joint with Biometric Society) 
Place: Room 334, Physics Hall 
Chairman: DouGias CHapMan, University of Washington, Seattle. 
1. A Stochastic Model for the Tunneling and Retunneling of Flour Beetles, M. 
AHMED, University of California, Berkeley. 


2. A Stochastic Model for the Number of Beetles on the Surface of Flour, Earu 
R. Ricu, University of California, Berkeley. 


11:00 a.m. Invited Papers VII. 


Place: Room 320, Physics Hall 
Chairman: Donavp R. Truax, California Institute of Technology 
1. The Distributions of Shadows with Applications to Traffic and Counter Prob- 
lems, HERMAN CuERNOFF, Stanford University. 
2. Bounds for Stochastic Processes, Z. W. BriRNBAUM, University of Washington. 


3. Quasi-Martingales and Stochastic Integrals, HERMAN Rustin, University of 
Oregon. 


1:30 p.m. Contributed Papers III. 


Place: Room 320, Physics Hall 
Chairman: M. R. Micxey, The RAND Corporation 
1. Incomplete Sufficient Statistics and Similar Tests, Ropert A. WissMaN, Uni- 
versity of California, Berkeley, (introduced by David Blackwell). 
2. Multi-decision Problems for the Multivariate Exponential Family, DoNALD 
R. Trvuax, California Institute of Technology. 


3. Some Distributions Related to D, +, Z.W. Brrnpaum and R. Pyke, University 
of Washington. 
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4. Sequential Distribution-free Tolerance Regions, Sam C. SAUNDERS, University 
of Washington. 

5. Idempotent Matrices and Quadratic Forms in the General Linear Hypothesis, 
FRANKLIN A. GRAYBILL and GeorGE MarsaGuia, Oklahoma A. and M. 
College. 

6. On Infinitely Divisible Random Vectors, Meyer Dwass, Northwestern and 
Stanford Universities and Henry Tercuer, Purdue and Stanford Universi 
ties. 

7. Further Contributions to Multivariate Confidence Bounds, 8. N. Roy and R. 
GNANADESIKAN, University of North Carolina, Chapel Hill. 

8. Contributions to univariate and multivariate components of variance analysis, 
S. N. Roy and R. GNANADESIKAN, University of North Carolina, Chapel 
Hill. 

9. The Linear Hypothesis, Information, and the Analysis of Variance, (Pre- 
liminary Report), CuesterR H. McCauu, Jr., The George Washington Uni- 
versity. (By Title) 

10. A Sequential Multiple Decision Procedure for Selecting the Multinomial Event 
with the Largest Probability (Preliminary Report) R. E. Becuuorer, Cornell 
University, and M. Soset, Bell Telephone Laboratories. (By Title) 

11. On the Existence of Uniformly Efficient Estimates, R. R. Banapur, University 
of Chicago. (By Title) 

12. Definite Quadratic Forms and Discontinuous Factor, ANpR& G. LAURENT, 
Michigan State University. (By Title) 

13. A Further Contribution to the Theory of Univariate Sampling on Successive 
Occasions (Preliminary Report), B. D. TrxkiwaL, University of North 
Carolina and Karnatak University. (By Title) 

14. Invariance, Sequential Decision Functions, and Continuous Time Processes, 
J. Krerer, Cornell University. (By Title) 

15. On the Construction of Fractional Factorial Designs, Ropert C. Burton, 
National Bureau of Standards, (introduced by W. C. Connor). (By Title) 


CHARLES Krarr 
Associate Secretary 


te 


MINUTES OF THE ANNUAL BUSINESS MEETING 
1956 


A business meeting of the Institute of Mathematical Statistics was called to 
order at 6:00 p.m., August 22, 1956, in Room 320, Physics building, University 
of Washington, Seattle, by President David Blackwell. Approximately 42 persons 
were present. A special announcement was made by G. E. Nicholson about the 
tragic death of John Wishart. 

Minutes of the Annual Business Meeting held in New York in December, 
1955, were read and approved. Z. W. Birnbaum introduced a resolution which 
had been recommended by the Council that the Kingston Policy on holding un- 
segregated meetings be made the permanent policy of the Institute. After con- 
siderable discussion this motion was passed. 

The reports of the Editor, Treasurer, Secretary, and Program coordinator were 
presented and approved. It was announced by the Secretary that Dorothy 
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Morrow Gilford had been appointed by the Council to be Associate Secretary 
for the Eastern Region to fill Allan Birnbaum’s unexpired term. 

The tellers were instructed to accept ballots from members who had not 
relayed them by mail. 

The President presented his report and turned the chair over to new President 
Alexander Mood. President Mood extended the thanks of the Institute to the 
outgoing president for his work for the Institute during the past year. 

Lucien LeCam moved that meetings of the Council be open for all members 
of the Institute to observe and listen. This motion passed. 

W. J. Dixon introduced a resolution of thanks to the University of Washington 
administration and the arrangements committee for the meeting. This was 
passed. 

The tellers announced the election of the following: 

President-Elect L. J. Savage 
Members of I.M.S. Council for term 1956-1959: 
T. W. Anderson 

M.S. Bartlett 
J. Berkson 
Erich Lehmann 

The meeting was adjourned at 8:15 p.m. 

GrEorGE E. NICHOLSON, JR. 
Secretary 


REPORT OF THE PRESIDENT OF THE INSTITUTE FOR 1956 


The affairs of the Institute ran smoothly during 1956. 

We have 1649 members as compared with 1505 a year ago. A substantial part 
of the increase can be attributed to the work of this year’s Committee on Indi- 
vidual Memberships, under the chairmanship of Benjamin Epstein. The Council 
voted special thanks to this committee for an outstanding job. 

We expect the Annals to reach a record size of 1225 pages this year, a consider- 
able increase over last year. In spite of this increase, we anticipate that the 
Institute will, financially, about break even. 

The Council has voted to hold a Summer Institute in 1957 on analysis of 
variance provided we get a grant from the National Science Foundation for this 
purpose. The organizing committee for the 1957 summer institute, with T. W. 
Anderson as chairman, is continuing its work. 

In addition to the annual meeting at Seattle on August 21-24, with the 
American Mathematical Society, the Institute held an Eastern Regional meeting 
in Princeton on April 20-21 and a Central Regional meeting at the University 
of Chicago on April 27-28. Future meetings already being contemplated are a 
Central Regional meeting in Detroit on September 7-9, a special meeting in 
conjunction with the AAAS in New York during the Christmas 1956 holidays, 


« 
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an Eastern Regional meeting at Catholic University on March 7-9, 1957, the 
annual 1957 meeting with the American Statistical Association in Atlantic City 
early in September, 1957, and the annual 1958 meeting with the American 
Mathematical Society, the time and place of which have not been determined. 

Our committee on Professional Standards, under the chairmanship of B. F. 
Kimball, has prepared a letter, which it plans to send to various personne] 
officers in state and local governments, concerning appropriate standards for 
statisticians in government work. 

We were happy to be able to invite five distinguished Russian probabilists, 
Kolmogorov, Hincin, Linnik, Gnedenko, and Prohorov, to attend our Seattle 
meeting. Unfortunately, because of delays in official machinery, the invitations 
could not be transmitted until July 2, and none of the Russians was able to 
come on such short notice. Kolmogorov wrote a most cordial letter, heartily 
endorsing the desire expressed in our letter to establish a closer contact between 
scholars of our two countries, and expressing the hope that some Soviet proba- 
bilists would be able to attend a later meeting of the Institute of Radio Engineers 
on information theory at MIT on September 10-12. 

It is a pleasure to announce that the Rietz Lecture Committee, with J. Ney- 
man as chairman, has designated J. Wolfowitz as our Rietz lecturer for 1957. He 
will deliver the lecture at the 1957 annual meeting in Atlantic City. 

The Council has unanimously recommended, and the members present at the 
1956 membership meeting have unanimously voted, that the Kingston policy 
be the permanent policy of the Institute. This policy is: “It is the policy of the 
Institute of Mathematical Statistics that all its meetings shall be held on a 
completely non-segregated basis. In particular prior to determining the place of 
a forthcoming meeting, the Secretary of the IMS shall ascertain that meeting 
halls, eating facilities and housing accommodations adequate for the expected 
attendance will be available on a non-segregated basis, and that all social events 
connected with the meetings shall be non-segregated.”” We have not experi- 
enced, and do not anticipate, substantial difficulty in conforming to this policy. 

We have reached an agreement with the University of Chicago Press, under 
which the Institute will cooperate with the Press in publishing, at no cost to the 
Institute, a series of statistical monographs, which will be available to IMS mem- 
bers at a 4 discount on publication orders. You will be notified when monographs 
are to appear. 

You have already been notified of the arrangement with the University of 
California Press, under which IMS members may purchase volumes of the 
Proceedings of the Third Berkeley Symposium on Probability and Statistics at a 
25 per cent discount. 

The Committee on Special Invited Papers, under the chairmanship of William 
Kruskal, has arranged for the following three Special Invited Papers: 

1. W. J. Youden, ‘Experimental Designs for Industrial Research’’; scheduled 

for Central Regional Meeting, Detroit, September, 1956. 
2. D. L. Wallace, on asymptotic approximations to distributions; scheduled 
for Annual Meeting, Atlantic City, September, 1957. 
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3. C. Hildreth, on statistical problems in economics; as yet unscheduled. 

The 1956 nominating committee consists of Charles Stein, chairman, Joseph 
Berkson, Kai Lai Chung, John Curtiss, and David Kendall. 

L. J. Savage was elected president-elect for 1957, and T. W. Anderson, M. S. 
Bartlett, J. Berkson, E. L. Lehmann were elected to the Council for 1957-59. 

A list of 1956 Institute committees and representatives is included as an 


appendix to this report. I know that all Institute members join me in expressing 


gratitude to these members for so generously giving their time to the affairs of 
the Institute. 
In closing, I wish to congratulate our nine newly elected Fellows. They are 
Allan Birnbaum 
D. R. Cox 
N. i Johnson 
Samuel Karlin 
W. H. Kruskal 
Walter L. Smith 
Milton Sobel 
Erling Sverdrup 
E. J. Williams 
Davip BLACKWELL 
President 


Appendix. 1956 IMS Committees and Representatives 
(The first name is that of the chairman) 


Academic Institutional Members: G. J. Lieberman, H. Robbins. 
Activities and Development: T. W. Anderson, J. Berkson, A. Bowker, T. E. Harris, W. 
Kruskal, S. Wilks. 
Exchanges: P. Dwyer, T. E. Harris (ez officio), G. E. Nicholson, Jr., (ex officio). 
Fellows: H. Levene, F. Anscombe, M. Bartlett, D. Blackwell, L. Goodman, L. J. Savage. 
Finance: M. Spiegelman, A. Bowker, K. J. Arnold. 
Individual Membership: B. Epstein, E. Crow, D. Chapman, F. Grubbs, E. Pike. 
Non-Academic Institutional Members: Bernice Brown, A. Householder, P. Olmstead, C. C. 
Hand. 
Physical Facilities: Z. W. Birnbaum, Leo Katz, G. E. Nicholson, Jr. 
Professional Standards: B. F. Kimball, R. Burgess, C. Eisenhart, G. Harrington, A. House- 
holder, J. Lev, H. Marshall, R. Patton, J. E. Walsh. 
Program Committees: 
Annual: M. Peisakoff. H. Raiffa, H. Levene, M. Rosenblatt, D. G. Chapman, J. Kiefer, 
B. Harshbarger, L. Katz (ex officio). 
Eastern: M. Zelen, R. L. Anderson, R. Bradley, G. Burrows, C. Derman, M. Halperin, 
L. Weiss, Leo Katz (ex officio). 
Central: S. Isaacson, F. C. Andrews, L. Cote, L. Goodman, J. Gurland, I. Olkin. 
Western: L. Moses, H. Rubin, Z. Birnbaum, B. Brown, M. Sandomire, G. Lieberman, 
J. Yerushalmy. 
Rietz Lecture: J. Neyman, H. Hotelling, W. Feller. 
Special Invited Papers: W. Kruskal, R. L. Anderson, K. Arrow, G. Cox, A. T. Craig, H. 
Hartley, I. Savage, T. E. Harris (ex officio). 
Mathematical Tables: J. W. Tukey, R. L. Anderson, A. Bowker, E. Cureton, W. Dixon, 
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C. Dunnett, C. Eisenhart, J. A. Greenwood, H. Hartley, E. Kaplan, W. Kruskal, D. B. 
Owen, D. Teichroew, Max Woodbury. 


Fast Machines: R. L. Anderson, F. 8. Acton, K. J. Arnold, A. 8. Householder, C. F. Cos- 
sack, W. H. Kruskal, W. J. Merrill, H. A. Meyer, J. Moshman, H. W. Norton, G. J. 
Resnikoff, R. Slimak, Z. Szatrowski, D. Teichroew. 

1957 Summer Institute: T. W. Anderson, H. Scheffé, J. W. Tukey, J. Cornfield, O. Kemp 
thorne. 


Inviting Russian Probabilists: E. Lukaes, D. Blackwell, J. L. Doob, J. Neyman, H. Robbins. 
1956 Nominating Committee: C. Stein, J. Berkson, K. L. Chung, J. Curtiss, D. G. Kendall. 


Representatives 


American Association for the Advancement of Science: H. Hotelling (through 1957) 
Intersociety Committee on Standardization of Nomenclature and Symbols: H. Raiffa. 
National Research Council: 8S. 8. Wilks (through 1957). 

Policy Committee for Mathematics: J. F. Daly. 

Council of Population and Housing Census Users: Paul Meier 


Sr 


REPORT OF THE SECRETARY FOR 1956 


During 1956 the Institute held its 69th through 72nd meetings. A business 
meeting was held during the 71st (19th Annual) meeting. The Program Com- 
mittees are to be congratulated on the excellent programs which have been 
arranged under the immediate direction of M. P. Peisakoff, Stanley Isaacson, 
and Gottfried E. Noether with the overall guidance of our Program Coordinator, 
Leo Katz. The Assistant Secretaries, M. R. Wilk, D. L. Wallace, D. G. Chaoman, 
and J. E. Lanahan are to be congratulated on the physical arrangements, and 
the Associate Secretaries, Allan Birnbaum, William Kruskal, and C. H. Kraft, 
on their performance of the duties of the Secretary with respect to the meetings. 

GEORGE E. NIcHOLSON, JR. 
Secretary 


rr 


INTERIM EDITOR’S REPORT FOR 1956 


The volume of material submitted to the Annals (in terms of manuscript 
pages) has shown an upward trend. In the year August 1, 1954—July 31, 1955 it 
was higher than in any preceding year. In the following year, the one just ending, 
it was somewhat less, but still higher than any of the previous years. The 1956 
Annals is accordingly being expanded, the Council having authorized 1225 pages 
for the year; the increase in size will enable the backlog of accepted papers to be 
brought below one issue. The Council has also authorized a 1957 volume of 1100 
pages. 

Thanks are due to many people, other than the regular members of the editorial 
staff, for generous refereeing assistance and other help. Specific acknowledgment 
will be made in the final report for the 1956 volume, which will appear in the 
March, 1957 issue. 
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PUBLICATIONS RECEIVED 


HerpAN, G., Language as Choice and Chance, P. Noordhoff, Ltd., Publishers, Groningen, 
Holland, $8.00. 

LaNING, J. HALCOMBE, JRr., and Ricuarp H. Barrin, Random Processes in Automatic Con- 
trol, McGraw-Hill Series in Control Systems Engineering, McGraw-Hill Book Com- 
pany, Inc., New York, 1956, 434 pp., $10.00. 

PETTERSSEN, SVERRE, Weather Analysis and Forecasting, Second Edition, Volume I, Mo- 
tion and Motion Systems, McGraw-Hill Book Company, Inc., New York, 1956, 428 pp. 
$8.50. 

The Biological Effects of Atomic Radiation, A Report to the Public, National Academy of 
Sciences—National Research Council, Washington, D. C., 1956, 40 pp. 

The Biological Effects of Atomic Radiation, Summary Reports, National Academy of Sciences 
—National Research Council, Washington, D. C., 1956, 108 pp. 


rr 


INSTITUTIONAL MEMBERS 


BELL TELEPHONE LABORATORIES, INc., TECHNICAL LiBRARY, 463 West Street, New York 
14, New York. 

INTERNATIONAL BusINEsSS MacHINES CorPoRATION, New York 

Iowa State CoLiecs, Statistica, LaBoratTory, Ames, Iowa 

MASSACHUSETTS INSTITUTE OF TECHNOLOGY, HAYDEN LisRaRy, PERIODICAL DEPARTMENT, 
Cambridge 39, Massachusetts 

MICHIGAN State CoLLece, DepaRTMENT oF Matuematics, East Lansing, Michigan 

PRINCETON UNIVERSITY, DEPARTMENT OF MATHEMATICS, SECTION OF MATHEMATICAL 
Sratistics, Princeton, New Jersey 

Purpve University Lisprarizs, Lafayette, Indiana 

State University or Iowa, Iowa City, Iowa 

UNIVERSITY OF CALIFORNIA, STATISTICAL LABoraToRY, Berkeley, California 

UNIVERSITY OF ILLINOIS, Urbana, Illinois 

University oF Nortsa Caro.ina, INsTITUTE oF Statistics, Chapel Hill, North Carolina 

UNIVERSITY OF WASHINGTON, LABORATORY OF STATISTICAL RESEARCH, Seattle, Washington 





ESTADISTICA 


Journal of the Inter American Statistical Institute 


Volume XIV, No. 52 Contents September 1956 
ARTICLES 
\plicaciones Estadisticas en las Ciencias Fisicas en los Estados Unidos (traduccién) 
WILiiAm R. Passt, Jr. 
Algunos Errores en la Declaracién de Edad en los Censos de Poblacién de 1950 en Centro 
América y México Jorce ArtAs B. 
Aproximacién de la Tasa Anual Promedio de Cambio Jacos S. SIEGEL 
Sources, Procedures of Compilation, and Types of Current Industrial Statistics in Canada 
DoMINION BUREAU OF STATISTICS 
Programacién del Desarrollo Estadistico Nacional 
OFICINA DE EstapfisTicA DE LAS NACIONES UNUDAS 
Collection of Mental Disease Statistics in the United States Morton KRAMER 
Problemas en la Aplicacién del Muestreo en Encuestas Agropecuarias en la América Latina 
OFICcINA DE EstapfsTICA DE LA FAO 
Problemas Encontrados en Estudios de Gastos de la Familia Hechos Recientemente en Paises 
Latinoamericanos ' PAULINE B. Paro 
Special Features. Legal Provisions. International Resolutions relating to Statistics. Institute 
Affairs. Statistical News. Publications. 


Published quarterly Annual subscription price $3.00 (U. S.) 


INTER AMERICAN STATISTICAL INSTITUTE 


Pan American Union 
Washington 6, D.C. 


ECONOMETRICA 


Journal of the Econometric Society 
Contents of Vol. 24, No. 4 - October, 1956 


Houuis B. Cuenery anp Kennets 8. KrerscHMer Resource Allocation for Economic Development 

Joun C. H. Fet.. A Fundamental Theorem for the Aggregation Problem of Input-Output Aualysis 

Jerome K. Percus anp Leon Quinto .... Application of Linear Programming to nn — 

idding 

P. J. VERDOORN : Complementarity and Long- Range Projections 

Harvey WAGNER... , An Eclectic Approach to the Pure Theory of Consumer Behavior 

Conrap A. BiyTH ‘ ; The Theory of Capital and Its Time Measures 

J. D. SarGan i os A Note on Mr. Blyth’s Article 

Maurice McMANUs...... ; .On Hatanaka’s Note on Consolidation 

E. BurGcer "On the Stability of Certain Economic Systems 

Boox Reviews 

The Alphabet of Economic Science (Philip H. Wicksteed). Review by George J. Stigler 

Elementi di politica economica razionale (Eraldo Fossati). Review by Gerhard Tintner 

Business Concentration and Price Policy (NBER). Review by Merton J. Peck 

Einfahrung in die Betriebswirtschafislehre (M. Lohmann). Review by Eberhard Fels 

Theory of Games as a Tool for the Moral Philosopher (R. B. Braithwaite). Review by —— Hawkins 

Etude Econométrique de la Demande de Tabac (Roger-Paul Congard). Review by R. L. Basmann 

Monopoly and Competition and Their Regulation (E. H. Chamberlin, Ed.). Review by F. H. Hahn 

Die Landwirtschaft in der volkswirtschaftlichen Entwicklung. Eine Betrachtung uber Beschaftigung und Ein- 
kommen (H. Weber). Roview by Kaethe Mengelberg 

Marginal-Cost Price-Output (Burnham P. Beckwith). Review by Nancy D. Ruggles 

Die Nachfrage nach Nahrungamitteln und ihre Abhdngigkeit von Preis- und Einkommensdnderungen: Ein 
dkonometrische Untersuchung von Wirtachaftsrechnungen Hamburger Angestellten- und Arbeiterhaushaltungen 
(Heinz Gollnick). Review by Eberhard Fels 


Published Quarterly Subscription rates available on request 


The Econometric Society is an international society for the advancement of economic theory in its 
relation to statistics and mathematics 

Subscriptions to Econometrica and inquiries about the work of the Society and the procedure in applying 
for membership should be addressed to Richard Ruggles, Secretary, The Econometric Society, Box 
1264, Yale University, New Haven, Connecticut. 





BIOMETRIKA 


Volume 43 Contents Parts 3 and 4, December 1956 


ROYSTON, Erica. Studies in the history of probability and statistics. III. A note on the history of the 
graphical presentation of data. WILLIAMS, C.B. Studies in the history of probability and statistics. IV 
A note on an early statistical study of literary style. WALKER, A. M. A goodness of fit test for spectra! 
distribution functions of stationary time series with normal residuals GANI, J. Sufficiency conditions 
in regular Markov chains and certain random walks. DERMAN,C. Some asymptotic distribution theory 
for Markov chains with a denumerable number of states. LAWLEY, D.N. A general method for approxi 
mating to the distribution of likelihood ratio criteria. JAMES, G s On the accuracy of weighted means 
and ratios. BAILEY, N. T. J. On estimating » latent and infectious periods of measles. II. Families 
with three or more susceptibles. BAILEY, N. T. J. Significance tests for a variable chance of infection in 
chain-binomial theory. WHITTLE, P. On the <olaien of yield variance with plot size. WATSON, 
G. 8. and WILLIAMS, E. J. On the construction of significance tests on the circle and the sphere. QUE 
NOUILLE, M. H. Notes on bias in estimation. ROY , 8. N. & MITRA,S. K. An introd: “ tion to some 
non-parametric generalizations of analysis of variance and multivariate analy sis. KAMAT, R. A two 
sample distribution-free test. BARTON, D. E. Addendum. The limiting distribution of "icamned! ‘8 test 
statistics. RAY, W.D. Sequential analysis applied to certain experimental designs in the analysis of vari 
ance BROADBENT, 3. R. Lognormal approximation to products and quotients. BLISS, C. 1.,COCH 
RAN, W. G. and TUKEY, J. W. A rejection criterion based upon the range. CROW, E. L. Confidence 
intervals for a proportion 
Miscellanea 

Sy by AnscomBgE, I , Buank, A. A., Cox, D. R., Davin, F. N., Davin, H. A., Dovueias 
J. B., Grupert, N. E. G., Hurtry, - A., Moore, P. G., Rerers@i, O., Rupen, H., Stuart, A., Tayior, 
J., W: atson, G. 5S 
Corrigenda 

Ganl, J.; and Tracts for Computers X X VI 


Reviews Other Books Received 


The subscription price, payable wn advance, is 45s. inland, 548. export (per volume including postage). Cheques 
should be drawn to Biometrika and sent to “The Secretary. Biometrika Office. Department of Statistics, 
University College. London, WC. 1." All foreign cheques must be in sterling and drawn on a bank 
having a London agency 


MATHEMATICAL REVIEWS 


A journal containing reviews of the mathematical liter- 
aiure of the world, with full subject and author indices 


Publication of this journal] is sponsored by the American Mathe- 
matical Society, Mathematical Association of America, Institute of 
Mathematical Statistics, London Mathematical Society, Edinburgh 
Mathematical Society, Union Matematica Argentina, and others 


Subscriptions accepted to cover the calendar year only. 
Issues appear monthly except July. $20.00 per year. 


Send subscription order or request for sample copy to 


AMERICAN MATHEMATICAL SOCIETY 
80 Waterman Street, Providence 6, Rhode Island 





JOURNAL OF THE 
ROYAL STATISTICAL SOCIETY 


Series B ( Methodological) 
Vol. XVIII, No. 1, 1956 


Some Tests of Significance with Ordered Variables F. N. Davi» anno N. L. Jounson. (With Discussion). 


Economic Choice of the Amount of Experimentation P. M. Grunpy, M. J. R. Heary, anv D. H. Ress. 


(With Discussion) 
On a Test of Significance in Pearson’s Biometrika Table (No. 11) inte 4 R. A. Fisuer. 


A Test of Significance for an Unidentifiable Relation es ....P. A. P. Moran. 
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